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EXECUTIVE  SUMMARY 


After  more  than  eight  years  of  the  War  on  Terrorism,  Improvised  Explosive 
Devices  (IEDs)  have  become  the  weapon  of  choice  for  the  terrorist  in  Iraq  and 
Afghanistan.  IEDs  accounted  for  the  majority  of  causalities  of  Allied  forces  and 
civilians.  One  of  the  reasons  for  the  proliferation  of  IEDs  is  the  ease  of  access  to  training 
material  available  on  the  Internet.  The  Internet  is  a  cheap,  convenient,  yet  powerful  tool 
to  access  a  vast  reservoir  of  infonnation  and  knowledge.  Unfortunately,  the  Internet  also 
empowers  technology-savvy  terror  networks  and  extremist  groups  to  create  IED 
education  networks  and  distribute  the  IED  know-how  to  their  operatives  and  supporters 
quickly  and  efficiently. 

One  solution  to  counter  this  problem  is  a  social  networking  tool  that  applies 
networking  theory  and  social  network  analysis  to  identify  terrorist  IED  education 
networks  quickly.  This  tool  would  utilize  an  open  source  web  crawler  that  could  index 
Arabic  websites  into  a  searchable  database  for  analyzing  and  querying  to  collect  more 
actionable  intelligence. 

The  Nutch  project  was  selected  as  the  search  engine  of  choice  for  this  social 
networking  tool.  Its  transparency  ranking  information  allows  the  users  the  ability  to 
tailor  the  ranking  to  meet  the  user’s  specific  requirements.  Its  versatile  plug-in 
architecture  provides  extensibility,  flexibility  and  maintainability. 

To  enable  Nutch  indexing  of  Arabic  websites,  an  Arabic  language  analyzer  needs 
to  be  added  into  Nutch’ s  library.  Multiple  experiments  were  used  to  test  the  perfonnance 
of  the  Arabic  language  analyzer  with  moderate  results. 

Overall,  Nutch  with  an  added  Arabic  analyzer  would  be  a  valuable  tool  improving 
an  existing  social  networking  tool  to  perfonn  page  correlation  and  analysis  of  Arabic 
websites.  Its  results  could  be  used  to  identify  IED  education  networks  and  to  collect  open 
source  intelligence. 
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I.  INTRODUCTION 


A.  MOTIVATION 

Since  its  invention,  the  Internet  has  revolutionized  communication.  It  enables 
people  to  exchange  ideas  and  share  information  rapidly  and  cheaply.  Unfortunately,  its 
lack  of  regulation  and  pervasive  communication  also  has  turned  it  into  the  new  tool  for 
the  tech-savvy  terrorists:  “Today,  almost  without  exception,  all  major  (and  many  minor) 
terrorist  and  insurgent  groups  have  web  sites”  [1].  Many  terror  organizations  such  as  Al- 
Qaeda  actively  use  the  Internet  to  recruit  new  members,  solicit  donations  from 
sympathizers,  and  spread  propaganda. 

They  also  turn  the  Internet  into  their  virtual  training  grounds,  offering  tutorials  on 
building  IEDs  and  planning  attacks.  These  training  materials  are  easily  accessible  to 
anyone  with  an  Internet  connection.  This  is  the  main  contribution  to  the  explosion  of 
IED  attacks  in  Iraq  and  Afghanistan.  To  counter  the  proliferation  of  IED  technology, 
these  IED  education  networks  need  to  be  identified,  monitored  and  referred  to  sovereign 
authorities  for  further  action  as  necessary. 

One  possible  solution  for  this  problem  is  a  social  networking  tool  that  applies 
network  science  to  identify  the  IED  education  network  via  the  World  Wide  Web.  In  [2], 
network  science  is  defined  as  the  study  of  networks  which  “contrasts,  compares,  and 
integrates  techniques  and  algorithms  developed  in  disciplines  as  diverse  as  mathematics, 
statistics,  physics,  social  network  analysis,  information  science  and  computer  science.” 
The  social  network  tool  would  incorporate  an  open  source  web  crawler  that  could  index 
Arabic  websites  into  a  searchable  database  for  analyzing  and  querying. 

B.  RESEARCH  OBJECTIVES 

The  research  objectives  of  this  thesis  were  to  enhance  a  Web  crawler  engine  with 
Arabic  search  capability  that  could  index  Arabic  language  websites  proficiently,  thus 
improving  an  existing  social  networking  tool  to  perfonn  page  correlation  and  analysis  of 
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Arabic  websites.  The  newly  enhanced  Web  crawler  could  help  speed  up  the  analytical 
process  of  the  social  networking  tool  to  effectively  identify  IED  education  networks  via 
the  World  Wide  Web. 

C.  THESIS  ORGANIZATION 

This  thesis  consists  of  six  chapters.  An  overview  of  the  motivation,  objectives 
and  thesis  organization  is  provided  in  Chapter  I.  A  brief  discussion  about  information 
retrieval,  a  description  of  Arabic  infonnation  retrieval  challenges,  stemming  in  Arabic 
and  the  light  stemmer  algorithm  is  contained  in  Chapter  II.  Lucene — a  scalable 
Infonnation  Retrieval  (IR)  library;  Nutch — an  open  source  search  engine;  and  Nutch’s 
plug-in  architecture  are  introduced  in  Chapter  III.  The  implementation  process  of  the 
light  stemmer  algorithm  into  Lucene’s  analyzer  database,  and  development  of  the 
ArabicAnalyer  plug-in  are  discussed  in  Chapter  IV.  The  performance  of  ArabicAnalyzer 
and  NutchDocumentAnalyzer  are  compared  in  Chapter  V.  The  summary  of  the  thesis  and 
future  research  recommendation  are  discussed  in  Chapter  VI. 
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II.  ARABIC  INFORMATION  RETRIEVAL 


A.  INFORMATION  RETRIEVAL 

The  fast  growth  of  the  Internet  accompanied  with  the  explosion  of  data  available 
via  the  World  Wide  Web  has  made  the  finding  of  useful  information  a  tedious  and 
difficult  task.  These  difficulties  have  attracted  renewed  interested  in  Information 
Retrieval  and  its  techniques. 

Infonnation  Retrieval  (IR)  is  the  science  of  locating  relevant  documents  in  a  large 
collection  of  documents.  The  retrieval  process  is  influenced  by  queries  supplied  by  the 
user’s  input,  the  indexing  process  and  the  natural  language  that  is  being  indexing  [3]. 

In  [4],  some  popular  IR  classic  strategies  are  the  Vector  Space  Model, 
Probabilistic  Retrieval,  Language  Model,  and  Inference  Networks. 

The  Vector  Space  Model  is  a  widely  used  retrieval  strategy.  In  this  model,  both 
the  query  and  each  document  are  represented  as  vector  in  terms  of  space.  A  measure  of 
similarity  between  the  two  vectors  is  computed. 

In  the  Probabilistic  Retrieval  model,  a  probability  based  on  the  likelihood  that  a 
term  will  appear  in  a  relevant  document  is  computed  for  each  term  in  the  collection.  For 
tenns  that  match  between  a  query  and  a  document,  the  similarity  measure  is  computed  as 
the  combination  of  the  probabilities  of  each  of  the  matching  terms  [4]. 

In  the  Language  Model,  a  language  model  is  inferred  for  each  document;  then  the 
probability  of  generating  the  query  according  to  each  of  these  models  is  computed. 
Documents  are  then  ranked  according  to  these  probabilities  [5]. 

Inference  Networks,  also  known  as  Bayesian  networks,  are  used  to  model 
documents,  the  documents’  contents  and  the  query.  It  then  uses  this  information  to  derive 
— “infer” — other  relationships.  The  strength  of  this  inference  is  then  used  as  the 
similarity  coefficient  [4]. 
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B.  THE  CHALLENGES  OF  ARABIC  IR 

According  to  [6],  there  are  over  200  million  native  Arabic  speakers  in  the  world 
and  over  20  million  people  speaking  it  as  a  second  language.  There  is  also  an 
exponential  growth  of  Internet  in  speaking  countries.  From  [7],  the  numbers  of  Internet 
users  in  Middle  East  countries  alone  have  grown  from  3  million  in  2000  to  58  million  in 
2009.  So,  there  is  increasingly  a  demand  for  an  Arabic  IR  as  well,  but  Arabic  poses 
many  challenges  for  IR 

First,  Arabic  has  a  very  complex  morphology  system.  In  [8],  the  authors 
observed: 

Arabic  has  two  genders,  feminine  and  masculine;  three  numbers,  singular, 
dual  and  plural;  and  three  grammatical  cases,  nominative,  genitive,  and 
accusative.  A  noun  has  the  nominative  case  when  it  is  a  subject, 
accusative  when  it  is  the  object  of  a  verb,  and  genitive  when  it  is  the  object 
of  a  preposition. 

This  would  compound  the  complexity  of  any  Arabic  IR  to  deal  with  this  morphology 
system. 

Second,  there  are  a  lot  of  ambiguities  in  Arabic.  One  of  the  major  contributions 
to  this  phenomenon  is  that  orthographic  variations  are  widespread  in  Arabic  [9].  The 
authors  gave  an  example  that  sometimes  in  combining  HAMZA  with  ALEF  (')  or 
MADDA  with  ALEF  ('),  the  HAMZA  (T  )  or  MADDA  (~)  is  dropped,  rendering  it 
ambiguous  to  whether  the  HAMZA  («■  )  or  MADDA  (~)  is  present.  Another  contribution 
to  the  higher  level  of  ambiguity  is  that  sometimes  vowels  (diacritics)  are  omitted  in 
written  Arabic,  which  may  change  the  meaning  of  the  words.  This  uncertainty  would 
affect  the  precision  and  recall  of  any  Arabic  IR. 

Finally,  the  plural  form  of  irregular  nouns,  broken  plurals,  is  common  in  Arabic. 
A  broken  plural’s  form  does  not  resemble  its  initial  singular  fonn.  It  does  not  obey 
nonnal  morphological  rules.  Because  of  that,  it  is  very  difficult  to  design  an  algorithm  to 
transform  this  kind  of  plural  to  singular  form  [9]. 
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c. 


RESEARCH  IN  ARABIC  IR 


Research  on  Arabic  IR  has  focused  on  using  word  roots  and  stems  as  index  terms. 
A  stem  is  the  remainder  of  the  word  after  removing  prefixes  and  suffixes.  On  the  other 
hand,  the  root  is  the  origin  of  the  word  that  remains  after  removing  nonessential 
characters,  prefixes  and  suffixes.  When  using  word  roots  as  index  terms,  a  linguistic 
knowledge  and  an  understanding  of  the  languages’  morphology  are  needed.  On  the  other 
hand,  prior  knowledge  of  the  language  is  not  required  when  using  stems  as  index  terms. 
In  [10],  the  authors  recognized  that  “stemming  is  one  of  many  tools  besides 
nonnalization  that  is  used  in  infonnation  retrieval  to  combat  the  vocabulary  mismatch 
problem.”  As  discussed  in  section  2b,  Arabic  is  very  difficult  to  stem,  therefore,  there 
were  only  a  few  available  Arabic  stemmers. 

One  of  the  earliest  stemmers  was  the  root-based  stemmer  proposed  by  Khoja  and 
Garside.  This  stemmer  removed  all  the  stopwords,  punctuation,  and  numbers.  Then  it 
peeled  away  prefixes  and  suffixes.  After  that,  it  matched  the  result  against  a  list  of 
patterns  to  extract  the  root.  Finally,  it  matched  the  extracted  root  against  a  list  of  known 
“valid”  roots.  There  are  a  few  weaknesses  in  the  Khoja  stemmer.  First,  it  can  provide 
wrong  solutions  when  removing  prefixes  and  suffixes.  It  also  can  generate  wrong  roots 
for  words  that  contain  EBDAL  [10],  [11],  [12]. 

Buckwalter’s  morphological  analyzer  is  another  useful  stemmer.  First,  this 
stemmer  converts  the  Arabic  word  into  English  letters.  Then,  it  segments  it  into  all 
probabilities  of  prefixes,  stems,  and  suffixes.  After  that,  it  checks  every  probability  with 
its  build-in  lexicon  libraries  (prefixes  dictionary,  stems  dictionary  and  suffixes 
dictionary).  If  all  the  word  elements  (prefix,  stem,  suffix)  are  found  in  their  respective 
libraries,  three  truth  tables  indicating  their  legal  combination  (prefixes-suffixes,  prefixes- 
stems,  and  stems-suffixes)  are  used  to  determine  whether  they  are  compatible.  If  the  word 
elements  pass  all  three  truth  tables,  the  probability  is  valid.  This  stemmer  provides  highly 
reliable  results,  but  its  performance  is  slow  [13]. 

The  light  stemmer  is  another  approach  for  Arabic  IR.  Most  light  stemmers  in  [8], 
[14]  are  based  on  the  same  idea:  extract  stems  by  deleting  the  most  frequent  prefixes  and 
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suffixes.  These  stemmers  are  not  interested  in  producing  the  Arabic  root.  This  thesis 
applies  the  light  stemmer  algorithm  in  [14]  to  enable  the  Web  crawler  with  an  Arabic 
search  capability.  A  more  detailed  discussion  is  in  the  next  section. 

D.  LIGHT  STEMMER  ALGORITHM 

1.  Introduction 

The  light  stemmer  allows  for  good  information  retrieval  results  without  providing 
the  correct  morphological  analyses  [10].  Anyone  can  employ  the  light  stemmer 
algorithm  without  the  required  language  skills. 

2.  The  Algorithm 

The  stemmer  has  two  parts:  Nonnalization  and  Stemmer.  The  Normalization 
process  is  used  to  normalize  the  orthography — the  writing  system — of  the  queries  and 
corpus.  The  stemmer  removes  suffixes  using  the  light  stemmer  algorithm  to  extract  the 
stems  [14]. 


a.  Normalization 

In  [14],  before  stemming,  corpus  and  queries  are  normalized  as  follows: 

(1) .  Convert  to  Windows  Arabic  encoding  (CP  12560). 

(2) .  Remove  punctuation. 

(3) .  Remove  diacritics  (primary  weak  vowels). 

(4) .  Remove  non  letters. 

(5) .  Replace  '  (ALEF  with  MADDA  above),  '  (ALEF  with 

HAMZA  above),  and  !  (ALEF  with  HAZA  below)  with  ' 
(ALEF) 

(6) .  Replace  final  ts  (ALEF  MAKSURA)  with  ^  (YEH) 

(7) .  Replace  final  s  (TEH  MARBUTA)  with  *  (HEH) 
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b.  Light  Stemmers 


After  the  corpus  and  queries  are  normalized,  the  stemmer  is  applied  as 

follows: 

(1) .  Remove  j  (WAW)  if  the  remainder  of  the  word  is  three  or 

more  characters  long. 

(2) .  Remove  any  of  the  definite  articles  if  this  leaves  two  or 

more  characters. 


(3)  Go  through  the  list  of  suffixes  once  in  the  right  to  left  order 
indicated  in  Figure  1,  removing  any  that  are  found  at  the 
end  of  the  word,  if  this  leaves  two  or  more  characters. 


Remove  from  front 

Remove  Suffixes 

Lightl 

Jl 3  <  J15  ,JL  ,JI3  . Jl 

none 

Light2 

3  <Jl9  « JIS  « JL  <JI3  i Jl 

none 

Light3 

66 

6 

Light8 

66 

tCL  ‘C/J  ‘^3  ‘^1  «Ol  «Lfc 

i_S  (6  16  i4i 

Figure  1.  String  removed  by  light  stemming.  From  [14] 

Light  1,  Light2,  Light3  and  Light8  apply  the  same  algorithm  in  the  stemming 
process.  The  difference  between  them  is  the  number  of  prefixes  and  suffixes  that  are 
removed  in  step  3  of  the  light  stemmer’s  algorithm.  In  Lightl,  the  Light  Stemmer 
algorithm  only  removes  five  prefixes  and  no  suffixes.  In  Light2,  the  Light  Stemmer 
algorithm  removes  six  prefixes  and  no  suffixes.  In  Light3,  the  Light  Stemmer  algorithm 
removes  six  prefixes  and  two  suffixes.  In  Light8,  the  Light  Stemmer  algorithm  removes 
six  prefixes  and  10  suffixes. 
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3. 


Results 


The  authors  in  [14]  compared  the  retrieval  effectiveness  of  the  light  stemmer 
algorithm  (Light8)  and  of  a  morphological  analyzer  (Khoja  stemmer).  Raw  in  Figure  2 
means  no  normalization  and  stemming.  From  Figure  2,  we  see  that  the  light  stemmer 
outperforms  Khoja  stemmer  and  raw  retrieval.  From  Table  1,  we  see  that  light  stemmer 
improved  over  90%  in  average  precision  from  raw  retrieval. 

The  authors  concluded  that  stemming  is  very  effective  on  Arabic  IR.  For 
monolingual  retrieval,  the  light  stemmer  has  demonstrated  improvement  of  around  100% 
in  average  precision  due  to  stemming  and  related  processes. 


o 
c n 
o 

LU 

oc 

QL 


Figure  2.  Monolingual  1 1-point  precision  results.  From  [14] 
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Table  1.  The  uniterpolated  average  precision.  From  [14] 


Ste  miner 

raw 

khoja-n 

khoja 

lights 

Av.  Precision 

.194 

.313 

.341 

.376 

Pet.  Change 

61.7 

76.2 

94.3 

E.  CHAPTER  SUMMARY 

In  this  chapter,  the  challenges  of  Arabic  IR  and  past  Arabic  IR  research  were 
covered.  Also  discussed  was  the  effectiveness  of  light  stemmer  in  Arabic  IR.  In  the  next 
chapter,  Lucene,  Nutch  and  Nutch’s  plug-in  architecture  are  introduced. 
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III.  LUCENE  AND  NUTCH 


A.  INTRODUCTION 

Lucene  and  Nutch,  created  by  Doug  Cutting,  are  two  open-source  software 
projects.  According  to  [15],  Lucene  is  a  high  perfonnance,  scalable  Infonnation 
Retrieval  (IR)  library  that  provides  Java-based  indexing  and  searching  technology  and 
advanced  analysis/tokenization  capabilities.  On  the  other  hand,  Nutch  is  a  search  engine 
that  was  built  on  top  of  Lucene.  Together,  they  can  make  a  full-featured  search  engine 
that  offers  transparency  into  how  Web  sites  are  ranked,  and  an  understanding  of  how  a 
large  search  engine  works  [16], 

B.  LUCENE 

1.  Overview 

Lucene  is  a  software  library  that  enables  users  to  add  indexing  and  searching 
capabilities  to  their  application.  Lucene  can  index  and  search  any  type  of  data  as  long  as 
it  can  be  converted  into  a  text  format.  This  means  Lucene  can  be  used  to  search  Web 
pages,  pdf  files,  and  Microsoft®  Word  files  because  textual  information  can  be  extracted 
from  them.  With  this  feature,  Lucene  is  the  best  toolkit  for  a  search  engine. 

2.  Indexing  Process 

Indexing  is  the  process  of  converting  text  into  an  index,  a  data  structure  that 
improves  the  speed  of  data  retrieval  operations.  The  index  is  the  fundamental  component 
of  Lucene. 

From  [16],  to  index  data  with  Lucene,  the  data  must  be  converted  into  a  stream  of 
plain  text  tokens,  a  format  that  Lucene  can  process.  After  that,  Lucene  prepares  the  data 
for  indexing  by  breaking  the  stream  of  plain  text  into  chunks  or  tokens  and  performing  a 
number  of  operations  on  them.  For  instance,  the  tokens  could  be  lowercase  before 
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indexing,  to  make  the  search  case-insensitive.  This  step  is  called  analysis.  After  the 
input  has  been  analyzed,  it  is  ready  to  be  added  into  the  index.  The  Indexing  process  is 
illustrated  in  Figure  3. 


Figure  3.  Lucene  indexing  architecture.  From  [17] 

Lucene  implements  an  innovative  approach  to  maintaining  the  index — rather  than 
maintaining  a  single  index,  Lucene  builds  multiple  index  segments  and  merges  them 
periodically.  Using  segments  allows  a  quick  way  to  add  new  documents  to  the  index  by 
adding  them  to  the  newly  created  index  segments  and  only  periodically  merging  them 
with  other  existing  segments.  This  process  makes  additions  efficient  because  it 
minimizes  physical  index  modifications. 

Some  IR  libraries  need  to  index  the  whole  corpus  again  when  new  data  is  added 
to  their  index;  Lucene  does  not  need  to  do  that  because  it  supports  incremental  indexing. 
This  means  Lucene  allows  the  contents  of  newly  added  documents  be  searchable 
immediately  without  indexing  the  whole  corpus  again  [15]. 
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3.  Analyzer 


As  discussed  above,  analysis  is  a  very  important  step  in  the  indexing  process.  It 
converts  a  field  of  text  into  the  most  fundamental  indexed  representation,  terms.  These 
terms  are  used  to  detennine  what  documents  match  a  query  during  searches. 

An  analyzer  is  an  encapsulation  of  the  analysis  process.  The  analyzer’s  job  is  to 
process  strings  of  text  into  a  stream  of  tokens  by  performing  any  number  of  operations  on 
them.  Lucene  includes  several  built-in  analyzers  that  do  a  good  job  at  analyzing  English- 
based  text.  For  analyzing  non-English  languages,  specific  language  analyzers  are  needed. 
Lucene’s  core  API  provides  building  blocks  to  create  custom  language  analyzers. 

C.  NUTCH 

1.  Architecture  Overview 

Nutch  is  a  complete  open-source  Web  search  engine  that  can  operate  at  one  of 
three  scales:  local  file  system,  intranet,  or  whole  Web  [15].  Nutch  can  be  divided  into 
two  parts:  the  crawler  and  the  searcher. 

From  [18],  components  of  the  crawler  are  WebDB,  the  fetch  list,  fetchers  and 
updates.  WebDB  is  a  custom  database  that  tracks  every  known  page  and  relevant  link.  It 
maintains  a  small  set  of  facts  about  each  page,  such  as  the  last  crawled  date.  Fetch  lists 
are  generated  from  WebDB.  These  lists  contain  the  URFs  that  users  want  to  download. 
The  fetchers  consume  the  fetch  lists  to  produce  the  WebDB  updates  and  the  Web 
contents.  The  updates  tell  which  page  has  changed  since  the  last  crawl.  The  contents  are 
used  to  search.  The  WebDB-fetch  cycle  is  designed  to  repeat  forever,  maintaining  an  up- 
to-date  image  of  the  Web. 

Once  the  Web  content  is  produced,  Nutch  can  get  ready  to  process  queries  using 
the  searchers.  First,  the  indexer  processes  the  Web  content  of  all  terms  and  pages  into  an 
inverted  index.  The  document  set  is  divided  into  a  set  of  index  segments,  each  of  which 
is  fed  into  a  single  searcher  process.  Each  searcher  also  draws  upon  the  Web  content 
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from  earlier  to  provide  a  cached  copy  of  any  Web  page.  Finally,  a  pool  of  Web  servers 
handles  the  interaction  with  users  and  contact  searcher  for  results.  A  generic  overview  of 
Nutch’s  architecture  is  shown  in  Figure  4. 


Figure  4.  Nutch’s  architecture.  From  [18] 


2.  Plug-In  Architecture 

Nutch’s  plug-in  system  is  based  on  the  Eclipse  2.0  plug-in  architecture.  It 
provides  a  core  service  for  controlling  a  set  of  tools  working  together  to  support 
programming  tasks.  After  reviewing  Eclipse’s  architecture  from  [19]  and  applying  it  to 
Nutch’s  plug-in  system,  we  observe  that  the  three  most  important  components  of  Nutch’s 
plug-in  system  are  Extension,  ExtensionPoints  and  Plug-in.  The  Extension  class  provides 
a  way  to  add  some  new  functions  to  a  plug-in.  It  is  defined  by  a  plug-in  that  wants  to 
extend  its  functionality  to  another  plug-in.  ExtensionPoints  define  an  interface  that  must 
be  implemented  by  the  Extension.  A  plug-in,  pluggable  component,  defines  a  number  of 
extension-points  that  may  allow  it  to  be  augmented  by  different  kinds  of  extension. 

This  system  is  the  mechanism  of  Nutch’s  extensibility.  Users  can  contribute  to 
the  Nutch  platform  by  wrapping  their  tools  in  plug-ins.  The  new  plug-ins  can  add  new 
processing  elements  to  existing  plug-ins,  and  Nutch  provides  a  set  of  core  plug-ins  to 
assist  the  process. 
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D. 


CHAPTER  SUMMARY 


In  this  chapter,  the  overview  of  Lucene’s  indexing  process  and  analyzer  were 
examined.  The  overview  of  Nutch’s  architecture  and  its  plug-in  system  were  also 
studied.  In  the  next  chapter,  the  implementation  process  of  the  light  stemmer  algorithm 
into  Nutch  is  discussed. 
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IV.  ARABICANALYZER  PLUG-IN  DEVELOPMENT 


A.  INTRODUCTION 

When  Nutch  finishes  fetching  a  segment  of  Web  sites,  the  language-identifier 
plug-in  is  called  to  identify  the  language  of  the  Web  sites  and  attach  a  language  code  to 
those  Web  sites.  After  that,  the  Analyzerfactory  instantiates  the  NutchAnalyzer  interface, 
which  defines  an  extension  point  that  associates  with  the  specific  language  code.  The 
NutchAnalyzer  extension  point  is  an  abstract  class  that  extends  the  Lucene  Analyzer 
class,  so  that  Lucene  analyzers  can  be  easily  integrated  as  NutchAnalyzer  plug-ins.  The 
policy  of  the  Analyzerfactory  for  finding  the  NutchAnalyzer  extension  to  use  is  to  return 
the  first  one  that  matches  a  specified  language  code.  If  none  is  found,  then  the  default 
NutchDocumentAnalyzer  is  used.  After  Analyzerfactory  identifies  the  right  analyzer 
basing  on  the  language  code,  the  NutchAnalyzer  calls  the  correct  analyzer,  in  this  case 
ArabicAnalyzer,  from  the  Lucene  analyzer  library  to  index  the  Web  site.  The  process  of 
indexing  a  Web  site  is  shown  in  Figure  5. 


Figure  5.  The  process  of  indexing  a  Web  site 
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B.  REQUIREMENT 


To  enable  Nutch  with  Arabic-search  capability,  there  are  several  tasks  that  need  to 
be  completed.  First,  the  Lucene  analysis  library  needs  to  be  updated  with  the 
ArabicAnalyzer  that  implemented  the  light  stemming  algorithm.  Secondly,  an 
ArabicAnalyzer  plug-in  is  needed  for  Nutch  to  be  able  to  access  the  Lucene  analysis 
library.  Finally,  an  Arabic  Ngram  profile  is  needed  to  train  Nutch  how  to  recognize 
Arabic  text. 

C.  DEVELOPMENT  PROCESS 

1.  Implementation  of  the  Light  Stemmer  Algorithm 

As  stated  above,  the  Lucene  analysis  library  needs  to  be  updated  with  the 
ArabicAnalyzer,  which  implements  the  light  stemmer  algorithm.  The  analysis  package 
contains  three  primary  files:  ArabicAnalyzer,  ArabicNormalizationFilter,  and 
ArabicStemFilter. 

The  ArabicAnalyzer  first  creates  a  list  of  Arabic  stop  words  that  is  based  on  the 
stoplist  from  http://members.unine.ch/jacques.savoy/clef/index.html.  It  uses  the  standard 
Stopfilter  to  filter  out  all  the  stop  words  from  the  token  stream.  The  result  is  then  fed  into 
ArabicNormalizationFilter,  which  normalizes  the  orthography.  The  final  result  is  then 
fed  into  the  ArabicStemFilter,  which  stems  the  token  stream  using  the  light  stemmer 
algorithm. 

2.  Development  of  A rabicA nalyzer  Plug-in 

The  host  plug-in  is  the  ArabicAnalyzer  class  in  Nutch.  The  NutchAnalyzer,  a 
Nutch  built-in  extension  point,  defines  the  interface  that  must  be  implemented  by  the 
Nutch’s  ArabicAnalyzer.  The  extender  plug-in  is  the  ArabicAnalyzer  from  Lucene’s 
analysis  library  that  extends  the  functions  of  the  Nutch’s  ArabicAnalyzer,  in  this  case,  the 
Lucene’s  ArabicAnalyzer  enables  the  Nutch’s  ArabicAnalyzer  to  index  Arabic  text. 
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Basically,  the  Nutch’s  ArabicAnalyzer  plug-in  is  a  wrapper  that  sets  the  stages  and  makes 
it  possible  to  run  Lucene’s  ArabicAnalyzer .  The  ArabicAnalyzer  plug-in  architecture  that 
was  derived  from  [19]  is  shown  in  Figure  6. 


Host  plug-in 


Plug-in  class 


ArabicAnalyzer 


Plug-in  id:  org.apache.nutch.analysis.ar 


Extension  point 


NutchAnalyzer 


extender  plug-in 


Plug-in  class 


Lib.lucene.analyzer 


Extension:  Analyzer 


Class:  ArabicAnalyzer 


Plug-in  id:  org. apache. lucene.analysis.ar 


Figure  6.  ArabicAnalyzer  plug-in  architecture.  From  [19] 

3.  Creating  Arabic  Ngram  profile 

Nutch  uses  the  language-identifier  plug-in  in  standard  Nutch’s  library  to  create  an 
Arabic  profile  based  on  the  “1000  most  frequent  words”  by  Jacques  Savoy  from  the  Web 
site  http://members.unine.ch/jacques.savoy/clef/index.html.  This  trains  Nutch  to 
“recognize”  Arabic  Web  sites  so  that  it  can  invoke  the  right  analyzer  to  index  the  Web 
sites. 
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D. 


CHAPTER  SUMMARY 


In  this  chapter,  the  ArabicAnalyzer  plugin  development  process  is  discussed.  The 
Lucene’s  analyzer  library  is  enhanced  with  the  ArabicAnalyzer  that  implements  the  light 
stemming  algorithm.  The  Nutch’s  plug-in  architecture  is  utilized  to  create  the 
ArabicAnalyzer  plug-in.  The  plug-in  enables  the  Nutch  search  engine  to  index  Arabic- 
language  Web  sites  using  the  ArabicAnalyzer  in  the  Lucene’s  analyzer  library.  In  the 
next  chapter,  the  performance  of  ArabicAnalyzer  and  NutchDocumentAnalyzer  are 
compared. 
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V.  EXPERIMENTAL  SETUP 


A.  PROBLEM  STATEMENT 

These  experiments  will  compare  the  result  of  Nutch  when  it  used  the  default 
NutchDocumentAnalyzer  with  ArabicAnalyzer  to  analyze  the  same  Web  site. 

The  NutchDocumentAnalyzer  separates  the  stream  of  tokens  into  individual  terms 
without  applying  any  filter.  For  example,  the  token  stream  “hello  world”  becomes 
“hello”  “world”  after  NutchDocumentAnalyzer  processes  it.  This  study  uses 
NutchDocumentAnalyzer' s  index  result  as  a  baseline,  because  no  term  is  discarded  during 
indexing  when  using  NutchDocumentAnalyzer  [15]. 

On  the  other  hand,  the  ArabicAnalyzer  applies  several  filters  when  analyzing  the 
stream  of  tokens.  First,  the  token  stream  goes  to  StopFilter,  which  removes  all  the  stop 
words  in  the  custom-built  stop  words  list.  The  result  is  then  filtered  again  using 
ArabicNormalizationF liter  to  normalize  the  orthography.  After  that,  the  result  again  is 
filtered  using  ArabicStemFilter,  which  applies  the  light  stemming  algorithm.  The  final 
result  is  then  stored  into  the  index  database. 

B.  HARDWARE  AND  SOFTWARE  CONFIGURATION 

The  platform  used  to  conduct  the  experiments  was  a  single  Dell  XPS  Ml 530 
laptop  personal  computer.  This  machine  had  an  Intel  Core  2  Duo  CPU  T9300  at  2.5  GHz 
with  4  GB  of  RAM  and  a  185  GB  hard  disk.  The  operating  system  used  was  Microsoft 
Windows  Vista  Home  Premium  with  Service  Pack  2. 

Nutch  1.0  and  Lucene  2.4.0  were  used  to  implement  the  light  stemmer  algorithm 
and  for  all  the  experiments. 
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c. 


METHODOLOGY 


There  were  three  experiments  to  collect  data.  The  first  experiment  used  Nutch  to 
crawl  eight  Web  sites  with  the  depth  of  five  and  topN  of  50.  TopN  detennines  the 
maximum  number  of  pages  that  are  retrieved  at  each  level  up  to  the  depth.  The  Web  sites 
are  alarabiya.net,  aljazeera.net,  alriyadh.com,  addustour.com,  aawsat.com, 
bbc.co.uk/Arabic/,  arabic.cnn.com  and  arnerica.gov/ar/.  Nutch  only  indexes  the  Web 
pages  within  these  sites  using  Arabic  Analyzer  and  NutchDocumentAnalyzer. 

The  second  experiment  computes  the  average  crawl  time  and  its  standard  deviation. 
The  crawler  was  set  to  crawl  four  out  of  the  eight  Web  sites  above  25  times  each. 

The  third  experiment  compares  the  ranking  of  the  top  10  pages  after  using  the  two 
algorithms  to  search  for  three  different  Arabic  terms. 

To  disable  ArabicAnalyzer,  the  following  code  was  added  into  the  property  block 
of  nutch-site.xml  file  in  the  conf  folder  so  that  AnalyzerFactory  is  forced  to  use 
NutchDocumentAnalyzer  to  index  these  sites  by  not  specifying  any  analyzer: 

<property> 

<name>plugin.includes</name> 

<value>protocol-http  \  urlfilter-regex  \parse-(text  \  html\js)  \  index- 
( basic  |  anchor)  \  query-(basic  \  site  \  url)  \  response- (/son  \xmlj  \  summary-basic  \  scoring- 
opic  |  language-identifier</value> 

<description> Regular  expression  naming  plugin  directory  names  to 
include.  Any  plugin  not  matching  this  expression  is  excluded. 

</description> 

</property> 

To  enable  ArabicAnalyzer,  the  following  code  replaces  the  above  code  within  the 
nutch-site.xml  file.  With  ArabicAnalyzer  on,  the  AnalyzerFactory  uses  it  to  index  these 
sites: 

<property> 

<narne>plugin.includes</name> 


22 


<value>protocol-http  \  urlfilter-regex  \parse-(text\html\js)  \  index- 

( basic  |  anchor)  \  query-(basic  \  site  \  iirl)  \  response- 0 son  \xml)  \  summary-basic  \  scoring- 

opic\language-identifier\analysis-ar</value> 

<description>ReguIar  expression  naming  plugin  directory  names  to 
include.  Any  plugin  not  matching  this  expression  is  excluded. 

</description> 

</property> 

D.  RESULTS  AND  DISCUSSION 

1.  Terms  Count 

The  first  experiment  shows  that  Nutch  needs  20%  to  37%  fewer  terms  to  index 
the  same  number  of  documents  from  the  same  Web  site  when  it  uses  ArabicAnalyzer. 
The  result  also  means  that  the  ArabicAnalyzer  plug-in  is  more  efficient  when  searching 
its  index  database,  because  it  searches  fewer  terms  to  locate  the  relevant  documents.  See 
Table  2  for  the  detailed  breakdown  of  each  Web  site. 


Table  2.  The  number  of  terms  counts 


Web  sites 

NutchDocumentAnalyzer 

(Terms  count) 

ArabicAnalyzer 

(Terms  count) 

arabic.cnn.com 

24776 

15827 

alarabiya.net 

21140 

15806 

alriyadh.com 

20898 

13163 

aljazeera.net 

18096 

13658 

bbc.co.uk/arabic/ 

16061 

9957 

america.gov/ar/ 

11435 

7958 

addustour.com 

2888 

2075 

aawsat.com 

1050 

847 
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2. 


Crawl  Time 


The  second  experiment  shows  that  Nutch  takes  longer  to  index  the  same  Web 
sites  when  it  uses  ArabicAnalyzer.  This  result  is  expected,  because  there  are  more  fdters 
in  ArabicAnalyzer.  thus,  it  requires  more  processing  power  and  time  to  index  Web  sites. 

The  results,  as  illustrated  in  Tables  3  to  6,  also  show  that  the  crawl  times 
fluctuated  more  when  Nutch  used  ArabicAnalyzer. 


Table  3.  Average  crawl  time  of  www.america.gov/ar/ 


Average  Crawl  time  (sec) 

Standard  Deviation  (sec) 

NutchDocumentAnalyzer 

362.92 

15.7 

ArabicAnalyzer 

375.2 

25.32 

Table  4.  Average  crawl  time  of  www.bbc.co.uk/arabic/ 


Average  Crawl  time  (sec) 

Standard  Deviation  (sec) 

NutchDocumentAnalyzer 

482.76 

5.95 

ArabicAnalyzer 

546.64 

37.05 

Table  5.  Average  crawl  time  of  www.addustour.com 


Average  Crawl  time  (sec) 

Standard  Deviation  (sec) 

NutchDocumentAnalyzer 

104.56 

1.67 

ArabicAnalyzer 

105.12 

2.38 

Table  6.  Average  crawl  time  of  www.aawsat.com 


Average  Crawl  time  (sec) 

Standard  Deviation  (sec) 

NutchDocumentAnalyzer 

69.56 

2.52 

ArabicAnalyzer 

70.2 

2.84 
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3. 


Search  Results 


For  the  third  experiment,  the  index  database  of  the  Web  site  www.america.gov/ar/ 
is  used  to  collect  search  results  data.  The  terms  shown  in  Table  7  are  used  for  the  search. 


Table  7.  Search  terms 


Normal  Form 

Light  Stemmer 

Form 

Meaning 

Economy 

IS  jxJ 

IS  jxJ 

The  United  States 

ULlJUpil 

Democratic 

The  Light  Stemmer  forms  are  searched  using  the  A rabicA nalyzer' s  index  database 
and  the  Normal  forms  are  searched  using  the  NuthDocumentA nalyzer' s  index  database. 

When  comparing  the  top  10  pages  of  the  search  term  “economy,”  the  top  seven 
pages  are  the  same;  for  the  search  term  “The  United  States,”  all  top  10  pages  are  the 
same;  and  for  the  search  term  “Democratic,”  six  pages  are  the  same  but  with  the  ranking 
different.  In  all  three  cases,  the  search  results  from  NutchDocumentAnalyzer  have  better 
ranking  scores  than  the  search  results  from  ArabicAnalyzer. 

By  the  title  of  the  search  results,  one  can  conclude  that  their  contents  are  related  to 
the  search  terms.  The  two  algorithms  also  hit  a  high  mark  on  relevance  of  information 
that  relates  to  the  search  tenns.  See  Tables  8  through  13  for  the  breakdown. 
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Table  8.  Search  results  of  term  “Economy”  using  ArabicAnalyzer 


Top  10  pages  using  ArabicAnalyzer 

Score  for  Query 

www .  americ  a .  go  v/ar/ econ.  html 

0.3486507 

www.america.gov/ar/publications/books/outline-of-the-us-economy.html 

0.12422927 

www .  america .  go  v/ar/ec  on/business  .html 

0.09217107 

www.america.gov/ar/reviving  trade  ar.html 

0.033118278 

www.america.gov/ar/publications/books.htmb/outline  economy 

0.016127191 

http  ://www.  america  .gov/ ar / 

0.003058498 

http://www.america.gov/ar/multimedia/photogallery.html 

6.69E-04 

www.america.gov/ar/publications/books.html 

6.47E-04 

www  .america  .gov/ ar/publications/ ej  ournalusa/ 1 209 .html 

5.85E-04 

www .  america .  gov/ ar/ index  .html 

5.73E-04 
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Table  9.  Search  results  of  term  “Economy”  using  NutchDocumentAnalyzer 


Top  10  pages  using  NutchDocumentAnalyzer 

Score  for  Query 

www .  americ  a .  go  v/ar/ econ.  html 

0.38501537 

www.america.gov/ar/publications/books/outline-of-the-us-economy.html 

0.13663794 

www .  america .  gov/ ar/econ/business .  html 

0.09989148 

www.america.gov/ar/publications/books.html#outline  economy 

0.01747075 

www .  ameri  ca .  gov/ ar / 

0.002472951 

www.america.gov/ar/multimedia/photogallery.html 

5.41E-04 

www .  america .  gov/ ar/ index  .html 

4.64E-04 

www .  america .  go  v/ ar/  world/  europe  .html 

4.64E-04 

www.america.gov/ar/world/mideast.html 

4.64E-04 

www .  america .  go  v/ar/world/sc  asia .  html 

4.02E-04 
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Table  10.  Search  results  of  term  “The  United  States”  using  ArabicAnalyzer 


Top  10  pages  using  ArabicAnalyzer 

Score  for  Query 

www.america.gov/ar/pages/footer/local/about-us.html 

0.1196895 

www.america.gov/ar/publications/books-content/musliminamerica.html 

0.11078926 

www .  america .  gov/ ar / amlife.html 

0.105654 

www .  america .  gov/ ar/ services/ mobile .  html 

0.042377986 

www. america. gov/ar/multimedia/photogallery.html#/4 1 1 0/mosques  ar/ 

0.022628564 

www.america.gov/ar/pubhcations/books.htmbfbeingmuslim 

0.015091554 

www.america.gOv/ar/multimedia/photogallery.html#/41 10/religious  freedom  ar i 

0.01136245 

www.  america.  gov/ar/publications/books.html#governed 

0.01132718 

www. america. gov/ar/multimedia/photogallery.html#/4 1 1 0/islam  ar/ 

0.011314282 

www .  ameri  ca .  gov/ ar/ 

0.003082759 
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Table  1 1 .  Search  results  of  term  “The  United  States”  using  NutchDocumentAnalyzer 


Top  10  pages  using  NutchDocumentAnalyzer 

Score  for  Query 

www.america.gov/ar/pages/footer/local/about-us.html 

0.11997691 

www.america.gov/ar/publications/books-content/musliminamerica.html 

0.11105819 

www .  america .  gov/ ar / amlife.html 

0.10594571 

www .  america .  gov/ ar/ services/ mobile .  html 

0.042462345 

www. america. gov/ar/multimedia/photogallery.html#/4 1 1 0/mosques_ar/ 

0.022691099 

www.america.gov/ar/pubhcations/books.htmbfbeingmuslim 

0.01513323 

www.america.gOv/ar/multimedia/photogallery.html#/41 10/religious  freedom  ar 7 

0.011393607 

www.  america.  gov/ar/publications/books.html#governed 

0.011358418 

www. america. gov/ar/multimedia/photogallery.html#/4 1 1 0/islam  ar/ 

0.01134555 

www.america.gov/ar/ 

0.002807639 
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Table  12.  Search  results  of  term  “Democratic”  using  ArabicAnalyzer 


Top  10  pages  using  NutchDocumentAnalyzer 

Score  for  Query 

www.america.gov/ar/global/democracy.html 

0.2665834 

www.  america.  gov/ ar / global.html 

0.16062789 

www.america.gov/ar/publications/ejoumalusa/608.html 

0.033635326 

www .  america .  gov/ ar/publications/ ej  ournalusa/ 0110.  html 

0.030587077 

www.america.gov/ar/democracy/global/index.html 

0.027611194 

www.america.gov/ar/ 

0.002160408 

www.america.gov/ar/multimedia/podcast.html 

6.10E-04 

www.america.gov/ar/publications/books.html 

5.85E-04 

www .  america .  gov/ ar/ amlife  .html 

5.46E-04 

www.america.gov/ar/publications/ejournalusa.html 

5.40E-04 
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Table  13.  Search  results  of  term  “Democratic”  using  NutchDocumentAnalyzer 


Top  10  pages  using  NutchDocumentAnalyzer 

Score  for  Query 

www.america.gov/ar/global/democracy.html 

0.29354417 

www  .america  .gov/ ar / global.html 

0.17680001 

www.america.gov/ar/publications/ejournalusa/01 10.html 

0.033559922 

www .  america .  gov/ ar/ democracy/ global/ index  .html 

0.030395675 

www.america.gov/ar/ 

0.002139257 

www.america.gov/ar/multimedia/podcast.html 

6.04E-04 

http  ://www .  americ  a .  go  v/ar/ amlife  .html 

5.40E-04 

www  .america  .gov/ ar/ amlife/people  .html 

4.68E-04 

www .  americ  a .  go  v/ar/ econ.  html 

4.68E-04 

www .  america .  gov/ ar/ multimedia .  html 

4.68E-04 
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A  more  detailed  breakdown  of  the  score  of  the  top  10  pages  using  ArabicAnalyzer 
and  NutchDocumentAnalyzer  is  shown  in  Appendices  A  through  F. 

E.  CHAPTER  SUMMARY 

In  this  chapter,  the  results  of  several  experiments  to  compare  the  perfonnance  of 
ArabicAnalyzer  and  NutchDocumentAnalyzer  were  described.  In  the  next  chapter,  the 
thesis  summary  and  future  work  recommendations  are  discussed. 
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VI.  CONCLUSION 


A.  SUMMARY 

Arabic  IR  is  a  challenging  problem  because  of  the  complexity  of  Arabic 
languages.  Even  though  the  light  stemmer  algorithm  was  not  a  perfect  solution  for 
Arabic  IR  problem,  it  showed  improvement  over  other  popular  methods.  The 
ArabicAnalyzer  plug-in  inherited  the  same  strengths  and  weaknesses  of  the  light  stemmer 
algorithm.  It  also  was  not  perfect,  but  it  did  show  great  promise  in  saving  storage 
overhead. 

The  experiments  completed  in  this  thesis  showed  that  there  are  advantages  and 
disadvantages  when  implementing  the  ArabicAnalyzer  plug-in.  It  is  clear  by  looking  at 
the  data  that,  in  general,  the  ArabicAnalyzer  plug-in  perfonned  as  well  as  the  default 
setting.  The  query  results  were  relevant  to  the  search  terms.  It  was  observed  that  the 
plug-in  ran  slower  than  the  default  setting,  but  the  speed  issue  could  be  overlooked  since 
the  data  that  this  research  was  trying  to  gather  did  not  have  to  be  in  real  time.  On  the 
other  hand,  the  ArabicAnalyzer  plug-in  would  require  at  least  20%  less  memory  for  its 
index  database,  compared  with  the  default  setting:  the  savings  in  storage  could  become  a 
major  plus  when  indexing  the  Internet. 

B.  FUTURE  WORK 

For  future  research,  the  plug-in  needs  to  be  integrated  into  the  social  networking 
tool  and  experiments  need  to  be  conducted  to  determine  the  recall,  precision  and 
relevance  of  the  plug-in  in  the  integration  environment.  The  experiments  should  also 
help  detennine  the  strengths  and  weaknesses  of  the  plug-in  in  such  environments  and 
recommend  improvement. 
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APPENDIX  A 


This  is  the  detail  score  for  query  of  top  10  pages  using  ArabicAnalyzer. 

Search  Term:  (economy) 

Page  1: 

•  boost  =  0.22821301 

.  digest  =  767d250a62c827c2bd330c0674546358 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  -  America.gov 

.  tstamp  =  20100305230954510 

•  url  =  http://www.america.gov/ar/econ.html 

score  for  query: 

.  0.3486507  =  (MATCH)  sum  of: 

o  0.18338637  =  (MATCH)  weigh^anchon'ti^ij^'^.O  in  15),  product  of: 

■  0.287963 1  =  query Weight( anchor: 'ti^L>=>^A2.0),  product  of: 

■  2.0  =  boost 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.63683987  =  (MATCH)  fieldWeigh^anchordLsA-K1^  in  15), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'ii^i_K,'-;i)=l) 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.15625  =  fieldNorm(field=anchor,  doc=15) 

o  6.6904654E-4  =  (MATCH)  weight(content:  in  15),  product  of: 

■  0.03756986  =  query Weigh^contentd^o3^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 
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0.035326175  =  queryNorm 


■  0.017808065  =  (MATCH)  fieldWeigh^contentfL^u-3^  in  15), 
product  of: 

■  2.4494898  =  tf(tcrmFrcq(contcntfo^^b)=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0068359375  =  fieldNorm(field=content,  doc=15) 

o  0.16459529  =  (MATCH)  weight(titledL3^o-al-2A1.5  in  15),  product  of: 

■  0.23745762  =  query  WeightCdle^-A^M  .5),  product  of: 

■  1.5=  boost 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.6931565  =  (MATCH)  fieldWeight(title:'J^i_K>^  in  15),  product 
of: 

■  1.4142135  =  tf(termFreq(title:'ti^L>a^)=2) 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.109375  =  fieldNorm(field=title,  doc=15) 

Page  2: 

•  boost  =  0.16124225 

•  digest  =  fdaal7fd08dfde3bb91a83a6d98afa04 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  J  IJI^  -  Outline  of  the  U.S.  Economy  -  America.gov 

.  tstamp  =  20100305230918398 

•  url  =  http://www.america.gov/ar/publications/books/outline-of-the-us- 
economy.html 


score  for  query:  IJCjosIj 

.  0. 12422927  =  (MATCH)  sum  of: 
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o  0.07335455  =  (MATCH)  weight(anchor:'t>A_K>^A2.0  in  84),  product  of: 

■  0.287963 1  =  query Weight( anchor: IjCj^lY^.O),  product  of: 

■  2.0  =  boost 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.25473595  =  (MATCH)  fieldWeight( anchor: in  84), 
product  of: 

■  1.0  =  tf(tennFreq(anchor:'ti^i_K>^)=l) 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.0625  =  fieldNorm(field=anchor,  doc=84) 

o  9.948079E-4  =  (MATCH)  weight(content: in  84),  product  of: 

■  0.03756986  =  query  Weigh^content'ti^o3^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.026478883  =  (MATCH)  fieldWeigh^contentfti^-u^  in  84), 
product  of: 

■  5.0990195  =  tf(termFreq(content:'L3Cli_K,l-:i)=26) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=84) 

o  0.049879905  =  (MATCH)  weight(title:'ii^o^A1.5  in  84),  product  of: 

■  0.23745762  =  queryWeight(title:lL3^o^lyT.5),  product  of: 

■  1.5=  boost 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.21005814  =  (MATCH)  fieldWeight( title: in  84),  product 
of: 

■  1.0  =  tf(termFreq(title:liic-1a-al:i)=l) 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 
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0.046875  =  fieldNorm(field=title,  doc=84) 


Page  3: 

boost  =  0.16781548 

.  digest  =  b4649 1 30898e202ca3  8ef6 1  b6b22b9 1 7 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  jo .  America.gov 

.  tstamp  =  20100305231006880 

•  url  =  http://www.america.gov/ar/econ/business.html 

score  for  query: 

.  0.09217107  =  (MATCH)  sum  of: 

o  0.091693185  =  (MATCH)  weight( anchor: 'c5*— O^a,-aA2.0  in  16),  product  of: 

■  0.287963 1  =  queryWeight(anchor:'3°i_K,lyv2.0),  product  of: 

■  2.0  =  boost 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.31841993  =  (MATCH)  fieldWeight( anchor: in  16), 
product  of: 

■  1.0  =  tf(tcrmFrcq(anchorfo^^-)=l ) 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.078125  =  fieldNorm(field=anchor,  doc=16) 

o  4.7789037E-4  =  (MATCH)  weight(content:  in  16),  product  of: 

■  0.03756986  =  query  Weight(content:'iiClL>a'-;i),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035326175  =  queryNorm 


38 


0.012720046  =  (MATCH)  fieldWeighhcontentdti^u^  in  16), 
product  of: 


■  2.4494898  =  tf(termFreq(content:'ti^L>=>^)=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=16) 

Page  4: 

•  boost  =  0.030659562 

.  digest  =  4304d87a 1 d5 1 1 87c 1 c 1 d0b2b4d 1 597a8 

•  lang  =  ar 

•  segment  =  20100305181031 

•  title  =  )u£)<_£  J>  -  !u£.'l£  j»  -  America.gov 

.  tstamp  =  20100305231 127141 

•  url  =  http://www.america.gov/ar/reviving_trade_ar.html 

score  for  query: 

.  0.033 1 1 8278  =  (MATCH)  sum  of: 

o  0.018338637  =  (MATCH)  weight( anchor: IJjej^lY^.O  in  104),  product  of: 

■  0.287963 1  =  query Weight( anchor: Ij^o^'-^.O),  product  of: 

■  2.0  =  boost 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.06368399  =  (MATCH)  fieldWeight( anchor: in  104), 
product  of: 

■  1.0  =  tf(termFreq(anchor:|L3Cli_K,'-;i)=l) 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.015625  =  fieldNorm(field=anchor,  doc=104) 

o  8.363082E-5  =  (MATCH)  weigh^contentdii^o3^  in  104),  product  of: 
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■  0.03756986  =  query  Weigh^contenti'ti^o3^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.002226008  =  (MATCH)  fieldWeigh^eontent'LsAj^  in  104), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'ti^i_K>^)=6) 

■  1.0635134  =  idf(docFreq=121,  numDoes=130) 

■  8.544922E-4  =  fieldNorm(field=content,  doc=104) 

o  0.014696008  =  (MATCH)  weight(title:'3°L>a'-y'T.5  in  104),  product  of: 

■  0.23745762  =  queryWeight(title:'3°L>aly'T.5),  product  of: 

■  1.5=  boost 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.06188897  =  (MATCH)  fieldWeight(title9j^a-^  in  104), 
product  of: 

■  1.4142135  =  tf(termFreq(title:'ii^u^)=2) 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.009765625  =  fieldNorm(field=title,  doc=104) 

Page  5: 

•  boost  =  0.02675021 

•  digest  =  aaf055cle690c63cf69285f8ab04f499 

•  lang  =  ar 

•  segment  =  20100305181330 

•  title  =  -  America.gov 

•  tstamp  =  20100305231345369 

•  url  =  http://www.america.gOv/ar/publications/books.html#outline_eeonomy 
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score  for  query: 

.  0.016127191  =  (MATCH)  sum  of: 

o  0.016046308  =  (MATCH)  weight( anchor: 'c5*— . 0  in  80),  product  of: 

■  0.287963 1  =  queryWeight(anchor:'3ClL>alyv2.0),  product  of: 

■  2.0  =  boost 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.05572349  =  (MATCH)  fieldWeight( anchor: in  80), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'L3ClL>a'-;i)=l) 

■  4.075775  =  idf(docFreq=5,  numDocs=130) 

■  0.013671875  =  fieldNorm(field=anchor,  doc=80) 

o  8.088332E-5  =  (MATCH)  weight(content: in  80),  product  of: 

■  0.03756986  =  query  Weigh^content'j^A-K1^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.0021528779  =  (MATCH)  fieldWeight(content:'t>A>=>E  in  80), 
product  of: 

■  3.3 166249  =  tf(termFreq(content:'ti^i_K>E)=l  1) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  6.1035156E-4  =  fieldNorm(field=content,  doc=80) 

Page  6: 

•  boost  =  1.0000145 

.  digest  =  0d5b023c802941ddb358071073a98833 

•  lang  =  ar 

•  segment  =  20100305180856 
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title  =  t<J' jJ^  -  '<J jd^  -  America. gov 

tstamp  =  20100305230902835 
url  =  http://www.america.gov/ar/ 


score  for  query: 

.  0.0030584983  =  (MATCH)  sum  of: 

o  0.0030584983  =  (MATCH)  weight(content: in  0),  product  of: 

■  0.03756986  =  query  Weight(content:'t>A>=>^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.08140829  =  (MATCH)  fieldWeigh^contenf'LS^AK1^  in  0), 
product  of: 

■  2.4494898  =  tf(tcrmFrcq(contcntfo^^^)=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.03125  =  fieldNorm(field=content,  doc=0) 

Page  7: 

.  boost  =  0.230095 12 

•  digest =  15d9ca5e7382f701cd03fb542ae3ab22 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  'Jsj ( _k>jj  -  sj'  jj  -  America.gov 

.  tstamp  =  20100305230915350 

•  url  =  http://www.america.gov/ar/multimedia/photogallery.html 

score  for  query: 

.  6.6904654E-4  =  (MATCH)  sum  of: 

o  6.6904654E-4  =  (MATCH)  weight(content: in  38),  product  of: 

■  0.03756986  =  query  Weighhcontent'ii^o^),  product  of: 
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■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.017808065  =  (MATCH)  fieldWeighhcontenf'ti^-u^  in  38), 
product  of: 

■  2.4494898  =  tf(termFreq(  content:  iL3^(>ai-2)=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0068359375  =  fieldNorm(field=content,  doc=38) 

Page  8: 

•  boost  =  0.22996004 

•  digest  =  a0130240b4348578aa8a83e59187dfb3 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  -  L-SCk-j .  America.gov 

.  tstamp  =  20100305231001279 

•  url  =  http://www.america.gov/ar/publications/books.html 

score  for  query: 

•  6.4706657E-4  =  (MATCH)  sum  of: 

o  6.4706657E-4  =  (MATCH)  weight(content:  in  73),  product  of: 

■  0.03756986  =  query  Weigh^content'^o3'^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=T30) 

■  0.035326175  =  queryNorm 

■  0.017223023  =  (MATCH)  fieldWeighhcontenh'ti^-u^  in  73), 
product  of: 

■  3.3 166249  =  tf(termFreq(content:'ii^LK»'J)=l  l) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=73) 
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Page  9: 

•  boost  =  0.16516872 

•  digest  =  bc202e5e0£508e4291bb897eec7814dc 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  j  -  1209  -  America.gov 

.  tstamp  =  20100305230941328 

•  url  =  http://www.america.gov/ar/publications/ejoumalusa/1209.html 

score  for  query: 

.  5.8529375E-4  =  (MATCH)  sum  of: 

o  5.8529375E-4  =  (MATCH)  weight(content: in  97),  product  of: 

■  0.03756986  =  query  Weight(content:'t>A>=>^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=T30) 

■  0.035326175  =  queryNorm 

■  0.01557881  =  (MATCH)  fieldWeight(content:'ii^i_K>^  in  97), 
product  of: 

■  3.0  =  tf(termFreq(content:iL3M>al-2)=9) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=97) 

Page  10: 

.  boost  =  0.19712433 

•  digest  =  c25a22al Iab6bec420c26625155ced62 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  ^  'J' -  America.gov 
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.  tstamp  =  20100305230929458 
•  url  =  http://www.america.gov/ar/index.html 

score  for  query: 

.  5 .7346845E-4  =  (MATCH)  sum  of: 

o  5.7346845E-4  =  (MATCH)  weight(content: in  30),  product  of: 

■  0.03756986  =  query  Weigh^contentdii^o3^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035326175  =  queryNorm 

■  0.015264055  =  (MATCH)  fieldWeigh^content'LsAj^  in  30), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'ti^i_K>E)=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=T30) 

■  0.005859375  =  fieldNorm(field=content,  doc=30) 
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APPENDIX  B 


This  is  the  detail  score  for  query  of  top  10  pages  using  NutchDocument Analyzer. 
Search  Term:  l  j  l  l  a  (ecomony) 

Page  1: 

•  boost  =  0.22826105 

.  digest  =  c33a5dc3f7d8475491bfafcf91c8b283 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  -  America.gov 

.  tstamp  =  2010030715 1153574 

•  url  =  http://www.america.gov/ar/econ.html 
score  for  query:  IJIjpoalJ 

.  0.38501537  =  (MATCH)  sum  of: 

o  0.1991 137  =  (MATCH)  weight(anchor:'J'tiCli_K,':iA2.0  in  16),  product  of: 

■  0.29873407  =  queryWeight(anchor:'J't>A>=>^A2.0),  product  of: 

■  2.0  =  boost 

■  4.2657595  =  idf(docFreq=4,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.6665249  =  (MATCH)  fieldWeigh^anchon'J'ti^o^  in  16), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'J'ti^i_K,'-;i)-l) 

■  4.2657595  =  idf(docFreq=4,  numDocs=131) 

■  0.15625  =  fieldNorm(field=anchor,  doc=16) 

o  5.40958E-4  =  (MATCH)  weight(content:'J|L3^Joal-2  in  16),  product  of: 

■  0.037221763  =  query  Weigh^contentU'ti^j^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 
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0.035015345  =  queryNorm 


■  0.01453338  =  (MATCH)  fieldWeigh^content'J'ti^A-K1^  in  16), 
product  of: 

■  2.0  =  tf(termFreq(content:'J'ticA>a':i)=4) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0068359375  =  fieldNorm(field=content,  doc=16) 

o  0.18536073  =  (MATCH)  weight(title:Wii^(>=>EAl.5  in  16),  product  of: 

■  0.25088066  =  query  Weight(  title:  5),  product  of: 

■  1.5=  boost 

■  4.776585  =  idf(docFreq=2,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.7388403  =  (MATCH)  TieldWeight^itledJ'ti^o3^  in  16),  product 
of: 

■  1.4142135  =  tf(termFreq(title:'J'ti^L>=>^)=2) 

■  4.776585  =  idf(docFreq=2,  numDocs=131) 

■  0.109375  =  fieldNorm(field=title,  doc=16) 

Page  2: 

•  boost  =  0.16124225 

•  digest  =  6120d6b7e6584b6a71b7d9990a68b952 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  j  IJI^  .  Outline  of  the  U.S.  Economy  -  America.gov 

.  tstamp  =  20100307151 112494 

•  url  =  http://www.america.gov/ar/publications/books/outline-of-the-us- 
economy.html 

score  for  query:  IJIJCjosIj 

.  0.13663794  =  (MATCH)  sum  of: 

o  0.07964548  =  (MATCH)  weight(anchor:Wij^(>=^A2.0  in  85),  product  of: 
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■  0.29873407  =  queryWeigh^anchon'J'j^iJ^'-^.O),  product  of: 

■  2.0  =  boost 

■  4.2657595  =  idf(docFreq=4,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.26660997  =  (MATCH)  fieldWeight( anchor: in  85), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'J'L3Cli_K,'-;i)=l) 

■  4.2657595  =  idf(docFreq=4,  numDocs=131) 

■  0.0625  =  fieldNorm(field=anchor,  doc=85) 

o  8.196751E-4  =  (MATCH)  weight(content:'J'ii^L>=>^  in  85),  product  of: 

■  0.037221763  =  query  Weigh^content'J'd^o^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.022021394  =  (MATCH)  fieldWeigh^contentfJ'j^o3^  in  85), 
product  of: 

■  4.2426405  =  tf(termFreq(content:'J'ti^i_K>^)=18) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0048828125  =  fieldNorm(field=content,  doc=85) 

o  0.056172792  =  (MATCH)  weight(title:|J|j--la^ljA  l  .5  in  85),  product  of: 

■  0.25088066  =  query  Weight( title: 1.5),  product  of: 

■  1.5=  boost 

■  4.776585  =  idf(docFreq=2,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.22390243  =  (MATCH)  fieldWeight(title:OoM_>-b  in  85), 
product  of: 

■  1.0  =  tf(termFreq(  title:' J'ti^L>=>^)=l) 

■  4.776585  =  idf(docFreq=2,  numDocs=131) 

■  0.046875  =  fieldNorm(field=title,  doc=85) 
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Page  3: 

•  boost  =  0.16784814 

•  digest  =  2e923bcfb9409e9be88aad90 1 98063bc 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  J® _  America.gov 

.  tstamp  =  20100307151205095 

•  url  =  http://www.america.gov/ar/econ/business.html 

score  for  query: 

.  0.09989148  =  (MATCH)  sum  of: 

o  0.09955685  =  (MATCH)  weight(anchor:yy^o^lyv2.0  in  17),  product  of: 

■  0.29873407  =  queryWeigh^anchon'J'ti^-u^'-^.O),  product  of: 

■  2.0  =  boost 

■  4.2657595  =  idf(docFreq=4,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.33326244  =  (MATCH)  fieldWeight( anchor: in  17), 
product  of: 

■  1.0  =  tf(tennFreq(anchor:'J'ti^L>=>^)=l) 

■  4.2657595  =  idf(docFreq=4,  numDocs=131) 

■  0.078125  =  fieldNorm(field=anchor,  doc=17) 

o  3.3463  IE-4  =  (MATCH)  weigh^content'J'ti^-u^  in  17),  product  of: 

■  0.037221763  =  query  Weight(content:U'ti^L>=>^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.008990197  =  (MATCH)  fieldWeighhcontentdJ'ti^u^  in  17), 
product  of: 

■  1.7320508  =  tf(termFreq(content:y'tiClL>a':i)=3) 
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■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0048828125  =  fieldNorm(field=content,  doc=17) 

Page  4: 

•  boost  =  0.0267502 1 

•  digest  =  4eb9 1 83dbdc405b0d40ef3c92da5ed66 

•  lang  =  ar 

•  segment  =  20100307101458 

•  title  =  -  America.gov 

.  tstamp  =  20100307151513307 

•  url  =  http://www.america.gOv/ar/publications/books.html#outline_economy 

score  for  query: 

.  0.01747075  =  (MATCH)  sum  of: 

o  0.017422449  =  (MATCH)  weight(anchor:Ulii^L>akA2.0  in  81),  product  of: 

■  0.29873407  =  queryWeigh^anchorfJ'ti^u^'^.O),  product  of: 

■  2.0  =  boost 

■  4.2657595  =  idf(docFreq=4,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.058320932  =  (MATCH)  HeldWeigh^anchorfJ'ti^o3^  in  81), 
product  of: 

■  1.0  =  tf(tennFreq(anchor:'J'ti^L>=>^)=l) 

■  4.2657595  =  idf(docFreq=4,  numDocs=131) 

■  0.013671875  =  fieldNorm(field=anchor,  doc=81) 

o  4.8299826E-5  =  (MATCH)  weight(content: ' in  81),  product  of: 

■  0.037221763  =  query  Weigh^contenhU'ti^u^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 
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■  0.0012976233  =  (MATCH)  fieldWeight(content:'J'ii^L>=>^  in  81), 
product  of: 

■  2.0  =  tf(tcrmFreq(content:U' 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  6.1035156E-4  =  fieldNorm(field=content,  doc=81) 

Page  5: 

•  boost  =  1.0000145 

•  digest  =  eed4dd9817b50ffda0aefl58be6e4cl2 

•  lang  =  ar 

•  segment  =  20100307101052 

•  title  =  ^  U' jJ is  -  America.gov 

.  tstamp  =  20100307151057483 

•  url  =  http://www.america.gov/ar/ 

score  for  query: 

.  0.00247295 1  =  (MATCH)  sum  of: 

o  0.00247295 1  =  (MATCH)  weight(content: in  0),  product  of: 

■  0.037221763  =  query  Weight(content:U't>A>=>^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.0664383 1  =  (MATCH)  fieldWeight(content:yy^i_K>'^  in  0), 
product  of: 

■  2.0  =  tf(termFreq(content:yy^i_K>^)=4) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03125  =  fieldNorm(field=content,  doc=0) 

Page  6: 

•  boost  =  0.23014276 

.  digest  =  5f50883579dcc0acb85ff3052764f758 


52 


•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  'Jm jf  'el' fcs  Jl>=> jj  -  America.gov 

.  tstamp  =  20100307151 109851 

•  url  =  http://www.america.gov/ar/multimedia/photogallery.html 

score  for  query: 

•  5.40958E-4  =  (MATCH)  sum  of: 

o  5.40958E-4  =  (MATCH)  weight(content:'J't>A>=>^  in  39),  product  of: 

■  0.037221763  =  query  Weigh^contentU'ti^o^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.01453338  =  (MATCH)  fieldWeigh^contenhU'ti^A-K1^  in  39), 
product  of: 

■  2.0  =  tf(tcrmFreq(content:C'lj^^'-)=4) 

■  1.063013  =  idf(docFreq=122,  numDocs=T31) 

■  0.0068359375  =  fieldNorm(field=content,  doc=39) 

Page  7: 

.  boost  =  0.19715214 

•  digest  =  e84ec632a6d47466f40d6beacbcfbdf7 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  ' <Ji jJ^  -  -  America.gov 

.  tstamp  =  20100307151 123237 

•  url  =  http://www.america.gov/ar/index.html 
score  for  query:  tJljpoaU 

•  4.636783E-4  =  (MATCH)  sum  of: 

o  4.636783E-4  =  (MATCH)  weight(content:U't>A>=>^  in  3 1),  product  of: 
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■  0.037221763  =  query Weigh^contenfU'jAj-3'-2),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.012457183  =  (MATCH)  fieldWeight(content:'J't>A_K>^  in  31), 
product  of: 

■  2.0  =  tf(tennFreq(content:'J'ti^L>=>^)=4) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.005859375  =  fieldNorm(field=content,  doc=31) 

Page  8: 

•  boost  =  0.20153543 

.  digest  =  67da72f899f80475flae62770921flbd 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  ' jjjM  j' - ' jjjm'  j' jj'o'ls'  -  America.gov 
.  tstamp  =  20100307151 127260 

•  url  =  http://www.america.gov/ar/world/europe.html 

score  for  query: 

•  4.636783E-4  =  (MATCH)  sum  of: 

o  4.636783E-4  =  (MATCH)  weigh^contenf'J'jAj^  in  128),  product  of: 

■  0.037221763  =  query  Weigh^contenfU'ti^u^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.012457183  =  (MATCH)  fieldWeigh^content^J'ti^ij-3^  in  128), 
product  of: 

■  2.0  =  tf(tcrmFrcq(content:U'lj^^'A=4) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.005859375  =  fieldNorm(field=content,  doc=128) 
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Page  9: 

•  boost  =  0.20091416 

.  digest  =  9256cf74ef9b595d81f726d5f347898a 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  t< — a -  IJ<j2 t< — a - 

America.gov 

.  tstamp  =  20100307151 148461 

•  url  =  http://www.america.gov/ar/world/mideast.html 

score  for  query: 

.  4.636783E-4  =  (MATCH)  sum  of: 

o  4.636783E-4  =  (MATCH)  weigh^contenf'J'tiAj^  in  129),  product  of: 

■  0.037221763  =  query Weight(content:U't>A_K>^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.012457183  =  (MATCH)  fieldWeighhcontenf'J'ti^o^  in  129), 
product  of: 

■  2.0  =  tf(tcrmFreq(content:C'liAj^'-)=4) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.005859375  =  fieldNorm(field=content,  doc=129) 

Page  10: 

.  boost  =  0.20039715 

•  digest  =  adbc4a97340b57bcc256c62131041c4c 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  jlA jj  iotf'  -  j<_A  jii  iotf'  -  America.gov 

.  tstamp  =  20100307151119356 
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•  url  =  http://www.america.gov/ar/world/scasia.html 

score  for  query: 

.  4.0 1 5572E-4  =  (MATCH)  sum  of: 


o  4.015572E-4  =  (MATCH)  weight(content:'JiL3M>al-i  in  130),  product  of: 

■  0.037221763  =  query Weight(content:U't>A_K>^),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035015345  =  queryNorm 

■  0.010788237  =  (MATCH)  fieldWeighhcontenf'J'ti^o^  in  130), 
product  of: 

■  1.7320508  =  tf(termFreq(content:'J'ti^L>a^)=3) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.005859375  =  fieldNorm(field=content,  doc=130) 
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APPENDIX  C 


This  is  the  detail  score  for  query  of  top  10  pages  using  ArabicAnalyzer. 

Search  Term:  Vs?~^'  (The  United  States) 

Page  1: 

•  boost  =  0.15805063 

•  digest  =  6b6baa67bd29d99a3cf293efeb2bc3el 

•  lang  =  ar 

•  segment  =  20100305181031 

•  title  =  £ j J  ? sdc  Vs? -U0  t >-*  ~  ? Jiit  Vs? £ >-»  -  America.gov 

.  tstamp  =  20100305231050378 

•  url  =  http://www.america.gov/ar/pages/footer/local/about-us.html 
score  for  query:  Vs?>^’ 

.  0.1 196895  =  (MATCH)  sum  of: 

o  0.059957497  =  (MATCH)  weight( anchor:  Vs? J^'A2.0  in  68),  product  of: 

■  0.261373  =  query Weight( anchor:  Vs? J^*A2.0),  product  of: 

■  2.0  =  boost 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.22939436  =  (MATCH)  fieldWeight( anchor:  Vs? in  68), 
product  of: 

■  1.0  =  tf(termFreq(anchor:  Vs?  j^')=l) 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.0625  =  fieldNorm(field=anchor,  doc=68) 

o  4.81681 14E-4  =  (MATCH)  weight(content:  Vs? in  68),  product  of: 

■  0.037867878  =  query Weight(content:^j^')>  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 
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0.035606395  =  queryNorm 


■  0.012720046  =  (MATCH)  fieldWeight(content:'fLS in  68), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'fcS  j^')=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=68) 

o  0.059250325  =  (MATCH)  weight(title:V^J^'A1.5  in  68),  product  of: 

■  0.23934121  =  query Weight( title: 1.5),  product  of: 

■  1.5=  boost 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.24755588  =  (MATCH)  fieldWeigh^titled^j^1  in  68),  product 
of: 

■  1.4142135  =  tf(termFreq(title:'^  j^')=2) 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.0390625  =  fieldNorm(field=title,  doc=68) 

Page  2: 

.  boost  =  0.16184442 

•  digest  =  0f454ab63865ae2e08003bb23896bfad 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  ju  ‘-ks  j-A  -  Being  Muslim  in  America  -  America.gov 
.  tstamp  =  20100305231009575 

•  url  =  http://www.america.gov/ar/publications/books- 
content/musliminamerica .  html 

score  for  query: 

.  0.1 1078926  =  (MATCH)  sum  of: 

o  0.059957497  =  (MATCH)  weight( anchor: VcsJ^'A2.0  in  72),  product  of: 
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■  0.261373  =  query  Weight( anchor: '^j^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.22939436  =  (MATCH)  fieldWeight( anchor:'^ in  72), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'fL5  j^')=l) 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.0625  =  fieldNorm(field=anchor,  doc=72) 

o  5.5619737E-4  =  (MATCH)  weight(content: in  72),  product  of: 

■  0.037867878  =  queryWeigh^content^J^'):.  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.014687842  =  (MATCH)  fieldWeight(content:'fLS in  72), 
product  of: 

■  2.828427  =  tf(termFreq(content:  j^')=8) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=72) 

o  0.050275568  =  (MATCH)  weight(title:'fcSj‘-^A1.5  in  72),  product  of: 

■  0.23934121  =  query Weight( title: 1.5),  product  of: 

■  1.5=  boost 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.21005814  =  (MATCH)  fieldWeight(title:V^ in  72),  product 
of: 

■  1.0  =  tf(tennFreq(title:'(%5  j^')=l) 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.046875  =  fieldNorm(field=title,  doc=72) 
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Page  3: 

•  boost  =  0.23032264 

•  digest  =  ce4al2d589cla56e886d5b6848609391 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  l<J 'JVcs 'J -  America.gov 
.  tstamp  =  20100305230939904 

•  url  =  http://www.america.gov/ar/amlife.html 

score  for  query: 

.  0.105654  =  (MATCH)  sum  of: 

o  0.10492562  =  (MATCH)  weight(anchor:'^j^A2.0  in  3),  product  of: 

■  0.261373  =  query  Weight(anchor:'^j^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.40144014  =  (MATCH)  fieldWeight( anchor:'^ in  3),  product 
of: 

■  1.0  =  tf(termFreq(anchor:'fc5  j^')=l) 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.109375  =  fieldNorm(field=anchor,  doc=3) 

o  7.28385  IE-4  =  (MATCH)  weight(content:'^J^'  in  3),  product  of: 

■  0.037867878  =  query Weight(eontent:'f>c5j^^  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.019234907  =  (MATCH)  fieldWeight(content:'^j^'  in  3), 
product  of: 

■  2.64575 12  =  tf(termFreq(content:'fLS  j^')=7) 
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■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0068359375  =  fieldNorm(field=content,  doc=3) 

Page  4: 

.  boost  =  0.15872316 

.  digest  =  dcfeb490d3db633dl6bfb0588d67076d 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  J  !lSASl>“U  -  jU  J  IlSASlAj  j 

'(To j>  PDA  -  America.gov 

.  tstamp  =  20100305230942596 

•  url  =  http://www.america.gov/ar/services/mobile.html 

score  for  query: 

.  0.042377986  =  (MATCH)  sum  of: 

o  4.81681 14E-4  =  (MATCH)  weight(content:  in  1 14),  product  of: 

■  0.037867878  =  query  Weight(content4f>L5j^^  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.012720046  =  (MATCH)  fieldWeight(content:'fcS in  1 14), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'fcS  j^')=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=l  14) 

o  0.041896306  =  (MATCH)  weight(title:'^j^Ai.5  in  1 14),  product  of: 

■  0.23934121  =  query  Weight( title: '^j^A  1.5),  product  of: 

■  1.5=  boost 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.035606395  =  queryNorm 
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■  0.17504844  =  (MATCH)  fieldWeight( title: in  1 14),  product 
of: 

■  1.0  =  tf(termFreq(  titled^  j^')=l) 

■  4.4812403  =  idf(docFreq=3,  numDocs=130) 

■  0.0390625  =  fieldNorm(field=title,  doc=l  14) 

Page  5: 

•  boost  =  0.04832446 

•  digest  =  87c8a44e7bc9cb3221f6823da385f8dd 

•  lang  =  ar 

•  segment  =  20100305181031 

•  title  =  'Jm jf  u^jj  -  s-1'  Jl>=> jj  -  America.gov 

.  tstamp  =  20100305231 143209 

•  url  = 

http://www.america.gOv/ar/multimedia/photogallery.html#/4 1 1 0/mosques_ar/ 

score  for  query: 

.  0.022628564  =  (MATCH)  sum  of: 

o  0.02248406  =  (MATCH)  weight(anchor:'fL5j^A2.0  in  50),  product  of: 

■  0.261373  =  queryWeight( anchor:'^ j^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.08602288  =  (MATCH)  fieldWeight( anchor: in  50), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'fc5  j^')=l) 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.0234375  =  fieldNorm(field=anchor,  doc=50) 

o  1.4450435E-4  =  (MATCH)  weight(content: in  50),  product  of: 
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■  0.037867878  =  query Weight(content:^j^')?  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.0038160137  =  (MATCH)  fieldWeight(content:'fL5j^'  in  50), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'fLS  j^')=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0014648438  =  fieldNorm(field=content,  doc=50) 

Page  6: 

.  boost  =  0.033444975 

•  digest  =  7212084a79cdl9adbfc07dc50d3c0ea4 

•  lang  =  ar 

•  segment  =  20100305181031 

•  title  =  -  tiuu  .  America.gov 

.  tstamp  =  20100305231 138931 

•  url  =  http://www.america.gOv/ar/publications/books.html#beingmuslim 

score  for  query: 

.  0.015091554  =  (MATCH)  sum  of: 

o  0.014989374  =  (MATCH)  weight( anchor: l^j«i3lA2.0  in  75),  product  of: 

■  0.261373  =  queryWeight( anchor:'^ j^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.05734859  =  (MATCH)  fieldWeight( anchor: in  75), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'fc5  j^')=l) 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 
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0.015625  =  fieldNorm(field=anchor,  doc=75) 


o  1.0217999E-4  =  (MATCH)  weight(content: in  75),  product  of: 

■  0.037867878  =  queryWeight(content:^j^')?  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.002698329  =  (MATCH)  fieldWeight(content:'^j^'  in  75), 
product  of: 

■  3.464 1016  =  tf(termFreq(content:  j^')=  1 2) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  7.3242 19E-4  =  fieldNorm(field=content,  doc=75) 

Page  7: 

•  boost  =  0.0420541 

.  digest  =  3354b6239b6eb27b9d241073f88fc34e 

•  lang  =  ar 

•  segment  =  20100305181031 

•  title  =  'Jm jf  sj' Jl>=> jj  -  America.gov 

.  tstamp  =  20100305231 110244 

•  url  = 

http://www.america.gOv/ar/multimedia/photogallery.html#/4 1 1 0/religious_freedo 
m_ar/ 

score  for  query: 

.  0.01 136245  =  (MATCH)  sum  of: 

o  0.01 124203  =  (MATCH)  weight(anchor:'^j^A2.0  in  52),  product  of: 

■  0.261373  =  queryWeight( anchor:'^ _>^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.035606395  =  queryNorm 
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■  0.04301 144  =  (MATCH)  fieldWeight( anchor:'^ in  52), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'fc5  j^')=l) 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.01 171875  =  fieldNorm(field=anchor,  doc=52) 

o  1 .20420285E-4  =  (MATCH)  weight(content:^j^'  in  52),  product  of: 

■  0.037867878  =  query  Weigh^contentd^j^X  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.00318001 14  =  (MATCH)  fieldWeight(eontent:'fcS in  52), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'fc5  j^')=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0012207031  =  fieldNorm(field=content,  doc=52) 

Page  8: 

•  boost  =  0.0267502 1 

.  digest  =  d4493509fble3146c2003310c9b70cbd 

•  lang  =  ar 

•  segment  =  20100305181330 

•  title  =  -  America.gov 

.  tstamp  =  20100305231409034 

•  url  =  http://www.america.gOv/ar/publications/books.html#governed 

score  for  query: 

.  0.01 132718  =  (MATCH)  sum  of: 

o  0.01 124203  =  (MATCH)  weight(anchor:'^j^A2.0  in  77),  product  of: 

■  0.261373  =  queryWeight(anchor:'^j^'A2.0),  product  of: 

■  2.0  =  boost 
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■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.04301 144  =  (MATCH)  fieldWeight( anchor:'^ in  77), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'fc5  j^')=l) 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.01 171875  =  fieldNorm(field=anchor,  doc=77) 

o  8.5 14999E-5  =  (MATCH)  weight(content:'^J^'  in  77),  product  of: 

■  0.037867878  =  query  Weigh^content^j'-^X  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.0022486076  =  (MATCH)  fieldWeight(content:'^j^'  in  77), 
product  of: 

■  3.464 1016  =  tf(termFreq(content:  j^')= 1 2) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  6.1035156E-4  =  fieldNorm(field=content,  doc=77) 

Page  9: 

•  boost  =  0.02541 1258 

.  digest  =  6b7361561b7255632af783ca69a88410 

•  lang  =  ar 

•  segment  =  20100305181330 

•  title  =  'Jm jf  s->'  Ji_k> jj  -  America.gov 

•  tstamp  =  2010030523141 1435 

•  url  =  http://www.america.gOv/ar/multimedia/photogallery.html#/4 1 1 0/islam_ar/ 

score  for  query: 

.  0.01 1314282  =  (MATCH)  sum  of: 

o  0.01 124203  =  (MATCH)  weight(anchor:'fLSj‘-^A2.0  in  49),  product  of: 
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■  0.261373  =  query  Weight( anchor: '^j^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.04301 144  =  (MATCH)  fieldWeight(  anchor:'^ in  49), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'fL5  j^')=l) 

■  3.6703098  =  idf(docFreq=8,  numDocs=130) 

■  0.01 171875  =  fieldNorm(field=anchor,  doc=49) 

o  7.2252 17E-5  =  (MATCH)  weight(content:'fLSj^'  in  49),  product  of: 

■  0.037867878  =  query Weight(content:^j^')?  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.0019080068  =  (MATCH)  field  Weight(eontent:'fcS in  49), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'fc5  j^')=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  7.3242 19E-4  =  fieldNorm(field=content,  doc=49) 

Page  10: 

•  boost  =  1.0000145 

.  digest  =  0d5b023c802941ddb358071073a98833 

•  lang  =  ar 

•  segment  =  20100305180856 

•  title  =  ' J' jJ^  -  IJl jJ<_ s  -  America.gov 

.  tstamp  =  20100305230902835 

•  url  =  http://www.america.gov/ar/ 

score  for  query: 
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.  0.0030827592  =  (MATCH)  sum  of: 

o  0.0030827592  =  (MATCH)  weight(content:  ^ in  0),  product  of: 

■  0.037867878  =  query  Weigh^content'^j^):.  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035606395  =  queryNorm 

■  0.08140829  =  (MATCH)  fieldWeight(content:'^j^'  in  0),  product 
of: 

■  2.4494898  =  tf(termFreq(content:'fLS  j^)=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.03125  =  fieldNorm(field=content,  doc=0) 
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APPENDIX  D 


This  is  the  detail  score  for  query  of  top  10  pages  using  NutchDocument Analyzer. 
Search  Term:  (America) 

Page  1: 

•  boost  =  0.1580853 

.  digest  =  65d01f780ed747de9fd07241fb39df44 

•  lang  =  ar 

•  segment  =  20100307101231 

•  title  =  £ j J  ? s:3c  Vs? t >-®  -  cjJ  f  Jii£.  Vs?  ^  t  >-*  -  America.gov 
.  tstamp  =  20100307151249455 

•  url  =  http://www.america.gov/ar/pages/footer/local/about-us.html 
score  for  query:  Vs?>^' 

.  0.1 1997691  =  (MATCH)  sum  of: 

o  0.060125146  =  (MATCH)  weight( anchor: Vs? J^'A2.0  in  69),  product  of: 

■  0.26155776  =  queryWeight( anchor: Vs? J^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.2298733  =  (MATCH)  fieldWeight(anchor:'fs?j^'  in  69),  product 
of: 

■  1.0  =  tf(termFreq(anchor:Vs?  J^0=1) 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.0625  =  fieldNorm(field=anchor,  doc=69) 

o  4.8056475E-4  =  (MATCH)  weight(content:Vs?  in  69),  product  of: 

■  0.037797898  =  queryWeigh^contentd^j1^1).  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 
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■  0.035557326  =  queryNorm 

■  0.01271406  =  (MATCH)  fieldWeight(content:'^ in  69), 
product  of: 

■  2.4494898  =  tf(termFreq(  content:  j^')=6) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0048828125  =  fieldNorm(field=content,  doc=69) 
o  0.05937 12  =  (MATCH)  weight( title: '^j^Al  -5  in  69),  product  of: 

■  0.23942009  =  query  Weight( titled^  j^'Al .5),  product  of: 

■  1.5=  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.2479792  =  (MATCH)  fieldWeight(title:V^J^'  in  69),  product  of: 

■  1.4142135  =  tf(termFreq(title:V^  j^)=2) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.0390625  =  fieldNorm(field=title,  doc=69) 

Page  2: 

.  boost  =  0.16184442 

•  digest  =  be96f39b462a546d99cbfa50ba70c710 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  ju  -  Being  Muslim  in  America  -  America.gov 

.  tstamp  =  20100307151207698 

•  url  =  http://www.america.gov/ar/publications/books- 
content/musliminamerica .  html 

score  for  query: 

.  0.1 1105819  =  (MATCH)  sum  of: 

o  0.060125146  =  (MATCH)  weight( anchor: V^J^'A2.0  in  73),  product  of: 
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■  0.26155776  =  queryWeight( anchor: V^J^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.2298733  =  (MATCH)  fieldWeight(anchor:'f>L5j^'  in  73),  product 
of: 

■  1.0  =  tf(termFreq(anchor:'fL5  j^')=l) 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.0625  =  fieldNorm(field=anchor,  doc=73) 

o  5.5490836E-4  =  (MATCH)  weight(content:V^J^'  in  73),  product  of: 

■  0.037797898  =  queryWeight(content:^j^')?  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.014680931  =  (MATCH)  fieldWeigh^content:'^^'  in  73), 
product  of: 

■  2.828427  =  tf(termFreq(content:VeS  j^')=8) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0048828125  =  fieldNorm(field=content,  doc=73) 

o  0.050378136  =  (MATCH)  weight(title:V^  j^|A  l  .5  in  73),  product  of: 

■  0.23942009  =  query  Weight(  title:  j^'Al .5),  product  of: 

■  1.5=  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.21041733  =  (MATCH)  fieldWeight(title:V^  in  73),  product 
of: 

■  1.0  =  tf(termFreq(title:^  j^')=l) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.046875  =  fieldNorm(field=title,  doc=73) 
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Page  3: 

•  boost  =  0.23039404 

•  digest  =  8ed8fcd743fflce5d4c42db83fc549af 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  IJW -  America.gov 
.  tstamp  =  20100307151 136760 

•  url  =  http://www.america.gov/ar/amlife.html 

score  for  query: 

.  0.10594571  =  (MATCH)  sum  of: 

o  0.10521901  =  (MATCH)  weigh^anchonV^J^^.O  in  3),  product  of: 

■  0.26155776  =  queryWeight(anchor:V^J^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.40227827  =  (MATCH)  fieldWeigh^anchon'^J1-^  in  3),  product 
of: 

■  1.0  =  tf(termFreq(anchor:'fL5  j^')=l) 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.109375  =  fieldNorm(field=anchor,  doc=3) 

o  7.2669686E-4  =  (MATCH)  weight(content:V^J^'  in  3),  product  of: 

■  0.037797898  =  queryWeighhcontend^j^'):.  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.019225854  =  (MATCH)  fieldWeigh^content'^J1-^  in  3), 
product  of: 
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■  2.64575 12  =  tf(termFreq(content:Vs?  j^')=7) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0068359375  =  fieldNorm(field=content,  doc=3) 

Page  4: 

.  boost  =  0.1587577 

•  digest  =  a5795145f4a839cf52528dlb49e03bdl 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  ^3*“  J  j'1 J  J  J 

j®  PDA  -  America.gov 

.  tstamp  =  20100307151 139253 

•  url  =  http://www.america.gov/ar/services/mobile.html 

score  for  query:  Vs?>^' 

.  0.042462345  =  (MATCH)  sum  of: 

o  4.8056475E-4  =  (MATCH)  weight(content:Vs?J^'  in  115),  product  of: 

■  0.037797898  =  query Weighhcontend^j^):.  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=T31) 

■  0.035557326  =  queryNorm 

■  0.01271406  =  (MATCH)  fieldWeighhcontent:'^ in  115), 
product  of: 

■  2.4494898  =  tf(termFreq(content:Vs?  j^')=6) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0048828125  =  fieldNorm(field=content,  doc=l  15) 
o  0.04198178  =  (MATCH)  weight(title:Vs?  j^'Al  .5  in  115),  product  of: 

■  0.23942009  =  query  Weight( title:  Vs?  J^'A1 .5),  product  of: 

■  1.5=  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 
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■  0.035557326  =  queryNorm 

■  0.17534778  =  (MATCH)  fieldWeight(title:V^J^'  in  1 15),  product 
of: 

■  1.0  =  tf(termFreq(  titled^  j^')=l) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.0390625  =  fieldNorm(field=title,  doc=l  15) 

Page  5: 

•  boost  =  0.04832446 

•  digest  =  22e534f03  Ie9c7ac8682fcd4f86523e4 

•  lang  =  ar 

•  segment  =  20100307101231 

•  title  =  iJs-1  jf  UifLf  jA/*  Jo*3  jj  -  America.gov 

.  tstamp  =  20100307151334977 

•  url  = 

http://www.america.gOv/ar/multimedia/photogallery.html#/4 1 1 0/mosques_ar/ 

score  for  query: 

.  0.022691099  =  (MATCH)  sum  of: 

o  0.02254693  =  (MATCH)  weight(anchord^Jlii|A2.0  in  5 1),  product  of: 

■  0.26155776  =  queryWeigh^anchord^J^'^.O),  product  of: 

■  2.0  =  boost 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.08620249  =  (MATCH)  fieldWeight(anchord^ in  5 1), 
product  of: 

■  1.0  =  tf(  tc  rm  F  rcq(a  n  c  h  or:  ^  j^')=l) 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.0234375  =  fieldNorm(field=anchor,  doc=51) 
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o  1.4416942E-4  =  (MATCH)  weight(content:'^ in  51),  product  of: 

■  0.037797898  =  query  Weigh^contend^j^):.  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.0038142179  =  (MATCH)  fieldWeight(content:'^ in  51), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'^  j^')=6) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0014648438  =  fieldNorm(field=content,  doc=51) 

Page  6: 

.  boost  =  0.033444975 

.  digest  =  80c97402726fad635131dblbb29555be 

•  lang  =  ar 

•  segment  =  20100307101231 

•  title  =  -  America.gov 

.  tstamp  =  20100307151330985 

•  url  =  http://www.america.gOv/ar/publications/books.html#beingmuslim 

score  for  query: 

.  0.01513323  =  (MATCH)  sum  of: 

o  0.0150312865  =  (MATCH)  weight(  anchor:  V^J^'A2.0  in  76),  product  of: 

■  0.26155776  =  queryWeigh^anchon'^J^'^.O),  product  of: 

■  2.0  =  boost 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.057468325  =  (MATCH)  fieldWeight(anchor:V^J^'  in  76), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'^  j^')=l) 
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■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.015625  =  fieldNorm(field=anchor,  doc=76) 

o  1.01943 166E-4  =  (MATCH)  weight(content:Vc5J^'  in  76),  product  of: 

■  0.037797898  =  query Weigh^content^j^X  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.0026970592  =  (MATCH)  fieldWeight(content:V^J^'  in  76), 
product  of: 

■  3.4641016  =  tf(termFreq(content:'^  j^')=12) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  7.3242 19E-4  =  fieldNorm(field=content,  doc=76) 

Page  7: 

•  boost  =  0.0420541 

•  digest  =  1  e  1  bc6ad9ffbfcdb82ea0 1 2b446 1  Obed 

•  lang  =  ar 

•  segment  =  20100307101231 

•  title  =  iJs-1  jf  t<Ji ^ s->'  Ji>=> jj  -  America.gov 

.  tstamp  =  20100307151307072 

•  url  = 

http://www.america.gOv/ar/multimedia/photogallery.html#/4 1 1 0/religious_freedo 
mar/ 

score  for  query: 

.  0.01 1393607  =  (MATCH)  sum  of: 

o  0.01 1273465  =  (MATCH)  weight(anchor:V^j^'A2.0  in  53),  product  of: 

■  0.26155776  =  queryWeight(anchor:V^J^'A2.0),  product  of: 

■  2.0  =  boost 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 
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■  0.035557326  =  queryNorm 

■  0.043101244  =  (MATCH)  fieldWeight(anchor:VeS in  53), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'^  j^')=l) 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.01 171875  =  fieldNorm(field=anchor,  doc=53) 

o  1.20141 19E-4  =  (MATCH)  weight(content:V^  in  53),  product  of: 

■  0.037797898  =  query Weight(contentdf>L5j^^  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.003178515  =  (MATCH)  fieldWeight(content:Vc5J^'  in  53), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'^  j^')=6) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0012207031  =  fieldNorm(field=content,  doc=53) 

Page  8: 

•  boost  =  0.02675021 

•  digest  =  df2eeaef879a60aaaddf7c8403cba7fa 

•  lang  =  ar 

•  segment  =  20100307101458 

•  title  =  .  tiuu  .  America.gov 

.  tstamp  =  20100307151541037 

•  url  =  http://www.america.gOv/ar/publications/books.html#governed 

score  for  query: 

.  0.01 1358418  =  (MATCH)  sum  of: 

o  0.01 1273465  =  (MATCH)  weight(anchor:V^J^'A2.0  in  78),  product  of: 

■  0.26155776  =  queryWeight(anchor:'^J^'A2.0),  product  of: 
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2.0  =  boost 


■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.043101244  =  (MATCH)  fieldWeight(anchor:'^ in  78), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'fL5  j^')=l) 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.01 171875  =  fieldNorm(field=anchor,  doc=78) 

o  8.495264E-5  =  (MATCH)  weight(content:Vc5J^'  in  78),  product  of: 

■  0.037797898  =  query  Weigh^contend^j^):.  product  of: 

■  1.063013  =  idf(docFreq=122,  nuinDocs=131) 

■  0.035557326  =  queryNorm 

■  0.0022475494  =  (MATCH)  fieldWeight(content:'^ in  78), 
product  of: 

■  3.4641016  =  tf(termFreq(content:'^  j^')=12) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  6.1035156E-4  =  fieldNorm(field=content,  doc=78) 

Page  9: 

•  boost  =  0.02541 1258 

•  digest  =  295971814b3454a9d44144054b5cl94a 

•  lang  =  ar 

•  segment  =  20100307101458 

•  title  =  jf  s-1' Jl>=> jj  -  America.gov 

.  tstamp  =  20100307151543423 

•  url  =  http://www.america.gOv/ar/multimedia/photogallery.html#/4 1 1 0/islam_ar/ 

score  for  query: 

.  0.01 13455495  =  (MATCH)  sum  of: 
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o  0.01 1273465  =  (MATCH)  weight( anchor: V^J^'A2.0  in  50),  product  of: 

■  0.26155776  =  queryWeight/anchon'^J^'^.O),  product  of: 

■  2.0  =  boost 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.043 101244  =  (MATCH)  fieldWeight(anchor:Vc5 in  50), 
product  of: 

■  1.0  =  tf(terrnFreq(anchor:'^  j^')=l) 

■  3.6779728  =  idf(docFreq=8,  numDocs=131) 

■  0.01 171875  =  fieldNorm(field=anchor,  doc=50) 

o  7.20847 IE-5  =  (MATCH)  weight(content:V^J1^1  in  50),  product  of: 

■  0.037797898  =  query Weight(content:^j^^  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.0019071089  =  (MATCH)  fieldWeight(content:V^J^'  in  50), 
product  of: 

■  2.4494898  =  tf(termFreq(content:'^  j^')=6) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  7.3242 19E-4  =  fieldNorm(field=content,  doc=50) 

Page  10: 

•  boost  =  1.0000145 

•  digest  =  eed4dd9817b50ffda0aefl58be6e4cl2 

•  lang  =  ar 

•  segment  =  20100307101052 

•  title  =  IJi jJ lS  -  America.gov 

.  tstamp  =  20100307151057483 

•  url  =  http://www.america.gov/ar/ 
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score  for  query: 

.  0.0028076388  =  (MATCH)  sum  of: 

o  0.0028076388  =  (MATCH)  weight(content:Vc5J^'  in  0),  product  of: 

■  0.037797898  =  query  Weigh^contentO^j^):.  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.035557326  =  queryNorm 

■  0.07428029  =  (MATCH)  fieldWeight(content:'^j^'  in  0),  product 
of: 

■  2.236068  =  tf(termFreq(content:Vc5  J^')=5) 

■  1.063013  =  idf(docFreq=122,  numDocs=T31) 

■  0.03125  =  fieldNorm(field=content,  doc=0) 
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APPENDIX  E 


This  is  the  detail  score  for  query  of  top  10  pages  using  Arabic  Analyzer. 

Search  Term:  (Democratic) 

Page  1: 

•  boost  =  0.16689056 

•  digest  =  6elb0463970c5b60bb75636a698cflb3 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  -  America.gov 

.  tstamp  =  20100305230951886 

•  url  =  http://www.america.gov/ar/global/democracy.html 
score  for  query:  ^  l  b 

•  0.2665834  =  (MATCH)  sum  of: 

o  0.15995954  =  (MATCH)  weight(anchor:A^dj^A2.0  in  23),  product  of: 

■  0.30052778  =  queryWeight(anchor:-\s?iijL^A2.0),  product  of: 

■  2.0  =  boost 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.5322621  =  (MATCH)  fieldWeight( anchor: in  23), 
product  of: 

■  1.0  =  tf(termFreq(anchor:A^iij'-^)=l) 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.125  =  fieldNorm(field=anchor,  doc=23) 

o  5.846775E-4  =  (MATCH)  weighhcontentA^dj^3  in  23),  product  of: 

■  0.037530307  =  query  Weight(content:-Cs?dj'-^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 
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0.035288982  =  queryNorm 


■  0.01557881  =  (MATCH)  fieldWeight(content:^iij'-^  in  23), 
product  of: 

■  3.0  =  tf(termFreq(content:A5(aiijl^,)=9) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=23) 

o  0.1060392  =  (MATCH)  weight(title:As?iijd=>A1.5  in  23),  product  of: 

■  0.22539584  =  queryWeight(title:As?iijd=>Al  .5),  product  of: 

■  1.5=  boost 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.47045767  =  (MATCH)  fieldWeight(title:As?iijd=>  in  23),  product 
of: 

■  1.4142135  =  tf(termFreq(title:As?iijd=)=2) 

■  4.2580967  =  idf(docFreq=4,  numDocs=T30) 

■  0.078125  =  fieldNorm(field=title,  doc=23) 

Page  2: 

.  boost  =  0.23113073 

.  digest  =  5285dc46473be73851750b409de012a5 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  -  'J^As  i<j£)<JpL?  -  America.gov 

.  tstamp  =  20100305230921085 

•  url  =  http://www.america.gov/ar/global.html 
score  for  query:  ^  p j j  l  h> 

.  0.16062789  =  (MATCH)  sum  of: 

o  0.15995954  =  (MATCH)  weight(anchor:As,\ijd=>A2.0  in  22),  product  of: 
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■  0.30052778  =  queryWeight(  anchor:  ASfiijd=A2.0),  product  of: 

■  2.0  =  boost 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.5322621  =  (MATCH)  fieldWeight( anchor: in  22), 
product  of: 

■  1.0  =  tf(termFreq(anchor:As,»iijd=)=  1 ) 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.125  =  fieldNorm(field=anchor,  doc=22) 

o  6.68342  IE-4  =  (MATCH)  weight(content:  As<»iijd=>  in  22),  product  of: 

■  0.037530307  =  query  Weight(content:As?iijd=>),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.017808065  =  (MATCH)  fieldWeight(content:As?iijd=>  in  22), 
product  of: 

■  2.4494898  =  tf(termFreq(  content:  As?iijd=)=6) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0068359375  =  fieldNorm(field=content,  doc=22) 

Page  3: 

.  boost  =  0.031816483 

•  digest  =  bba906c38386b2e71f42a4f7d365e8cb 

•  lang  =  ar 

•  segment  =  20100305181031 

•  title  =  U'u* j'ii  -  America.gov 

.  tstamp  =  20100305231058196 

•  url  =  http://www.america.gov/ar/publications/ejoumalusa/608.html 
score  for  query:  ^  l  b 

•  0.033635326  =  (MATCH)  sum  of: 
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o  0.017495574  =  (MATCH)  weight( anchor :As?l3j'-^a2.0  in  98),  product  of: 

■  0.30052778  =  queryWeight( anchor: As,\ijdaA2.0),  product  of: 

■  2.0  =  boost 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.058216166  =  (MATCH)  fieldWeight(anchor:^iij'-^  in  98), 
product  of: 

■  1.0  =  tf(tennFreq(anchor:^,»lij'3a)=l) 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.013671875  =  fieldNorm(field=anchor,  doc=98) 

o  2.3387 IE-4  =  (MATCH)  weight(content: in  98),  product  of: 

■  0.037530307  =  query Weight(content:A5fijj'-^,  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.00623 1 524  =  (MATCH)  fieldWeight(content:  in  98), 

product  of: 

■  6.0  =  tf(tennFreq(content:AS(aiij'^“)=36) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  9.765625E-4  =  fieldNorm(field=content,  doc=98) 

o  0.015905881  =  (MATCH)  weight(title:As?iijd=A1.5  in  98),  product  of: 

■  0.22539584  =  queryWeight(title:As?jjdaAl  .5),  product  of: 

■  1.5=  boost 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.07056865  =  (MATCH)  fieldWeight(title:As?iijd=>  in  98),  product 
of: 

■  1.4142135  =  tf(termFreq(title:ASr‘iijd=)=2) 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 
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■  0.01 171875  =  fieldNorm(field=title,  doc=98) 

Page  4: 

.  boost  =  0.11378951 

•  digest  =  b8cl57220365a4bfl04bc045832885be 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  j  i-i is  -  0110  - 

America.gov 

.  tstamp  =  20100305231013594 

•  url  =  http://www.america.gov/ar/publications/ejoumalusa/01 10. html 
score  for  query:  ^  l  b 

•  0.030587077  =  (MATCH)  sum  of: 

o  5.9466175E-4  =  (MATCH)  weight(content:As?iijd=  in  88),  product  of: 

■  0.037530307  =  queryWeight(content: As?iijd=),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.01584484  =  (MATCH)  fieldWeighhcontentASfiij'-^  in  88), 
product  of: 

■  4.358899  =  tf(termFreq(content:^tijd=>)=19) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0034179688  =  fieldNorm(field=content,  doc=88) 

o  0.029992415  =  (MATCH)  weight(title:As?iijd=A1.5  in  88),  product  of: 

■  0.22539584  =  queryWeight(title:^tiJ^“Al  -5),  product  of: 

■  1.5=  boost 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.13306552  =  (MATCH)  fieldWeigh^title^fiij^3  in  88),  product 
of: 
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■  1.0  =  tf(termFreq(title:A^LL>'-^)=l) 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.03125  =  fieldNorm(field=title,  doc=88) 

Page  5: 

•  boost  =  0.028445216 

.  digest  =  C04e43d37fb6f380a397373427882ale 

•  lang  =  ar 

•  segment  =  20100305181 151 

•  title  =  fiiV  -  IJgdJp  |  jd=uju  c J  -  America.gov 

.  tstamp  =  20100305231252420 

•  url  =  http://www.america.gov/ar/democracy/global/index.html 
score  for  query:  ^  l  b 

•  0.02761 1194  =  (MATCH)  sum  of: 

o  0.019994942  =  (MATCH)  weight(anchor:A^iijd=A2.0  in  14),  product  of: 

■  0.30052778  =  queryWeight(anchor:-\SfiijdaA2.0),  product  of: 

■  2.0  =  boost 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.06653276  =  (MATCH)  fieldWeight(anchor:A^jjd=  in  14), 
product  of: 

■  1.0  =  tf(termFreq(anchor:^^tijl-l=)=l) 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.015625  =  fieldNorm(field=anchor,  doc=14) 

o  1.181473E-4  =  (MATCH)  weight(content:-\s?iijd=  in  14),  product  of: 

■  0.037530307  =  queryWeight(content:-\Sr\ijd=),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035288982  =  queryNorm 
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■  0.0031480505  =  (MATCH)  fieldWeighhcontentA^iij^3  in  14), 
product  of: 

■  3.4641016  =  tf(termFreq(  content:  ASj\ijd=)=12) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  8.544922E-4  =  fieldNorm(field=content,  doc=14) 

o  0.0074981037  =  (MATCH)  weight(title:As?iijd=A1.5  in  14),  product  of: 

■  0.22539584  =  queryWeight(title:As?3j^,/'T  .5),  product  of: 

■  1.5=  boost 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.03326638  =  (MATCH)  fieldWeight(title:As,\ijd=>  in  14),  product 
of: 

■  1.0  =  tf(termFreq(title:ASfiijd=>)=l) 

■  4.2580967  =  idf(docFreq=4,  numDocs=130) 

■  0.0078125  =  fieldNorm(field=title,  doc=14) 

Page  6: 

•  boost  =  1.0000145 

.  digest  =  0d5b023c80294 1 ddb35807 1 073a98833 

•  lang  =  ar 

•  segment  =  20100305180856 

•  title  =  I J jJlS  -  IJi jJ b  -  America.gov 

.  tstamp  =  20100305230902835 

•  url  =  http://www.america.gov/ar/ 
score  for  query:  ^  j  l  b 

•  0.0021604078  =  (MATCH)  sum  of: 

o  0.002 1604078  =  (MATCH)  weight(content:  in  0),  product  of: 

■  0.037530307  =  query  WeighhcontentA^iij'-^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 
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0.035288982  =  queryNorm 


■  0.05756435  =  (MATCH)  fieldWeight(content:^^iij'-^  in  0), 

product  of: 

■  1.7320508  =  tf(termFreq(content:^lijd=>)=3) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.03125  =  fieldNorm(field=content,  doc=0) 

Page  7: 

•  boost  =  0.22860475 

•  digest  =  5a62cd3a20d5393ff5806fd92afledef 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  Sr1. -  ss .  America.gov 


.  tstamp  =  20100305230934690 

•  url  =  http://www.america.gov/ar/multimedia/podcast.html 
score  for  query:  ^  pjj  l  b 

•  6.101 1006E-4  =  (MATCH)  sum  of: 

o  6.101 1006E-4  =  (MATCH)  weighhcontentA^iij'-^  in  60),  product  of: 

■  0.037530307  =  query  WeighhcontentA^iij'-^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.016256463  =  (MATCH)  fieldWeighhcontentASfiij'-^  in  60), 
product  of: 

■  2.236068  =  tf(termFreq(content:As?iijd=)=5) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0068359375  =  fieldNorm(field=content,  doc=60) 

Page  8: 

•  boost  =  0.22996004 

•  digest  =  a0130240b4348578aa8a83e59187dfb3 
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•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  .  tiuu  -  America.gov 

.  tstamp  =  20100305231001279 

•  url  =  http://www.america.gov/ar/publications/books.html 
score  for  query:  ^  pjj  l  b 

•  5.846775E-4  =  (MATCH)  sum  of: 

o  5.846775E-4  =  (MATCH)  weighhcontenfASfiij^3  in  73),  product  of: 

■  0.037530307  =  query  Weight(content:As?iijd=>),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.01557881  =  (MATCH)  fieldWeight(content:ASfiijd=>  in  73), 
product  of: 

■  3.0  =  tf(termFreq(content:A5fiijd=)=9) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0048828125  =  fieldNorm(field=content,  doc=73) 

Page  9: 

•  boost  =  0.23032264 

•  digest  =  ce4al2d589cla56e886d5b6848609391 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  ' J'fcS jA/*  - ' -  America.gov 

.  tstamp  =  20100305230939904 

•  url  =  http://www.america.gov/ar/amlife.html 
score  for  query:  ^  l  b 

•  5.4569903E-4  =  (MATCH)  sum  of: 

o  5.4569903E-4  =  (MATCH)  weight(content:As,\ijd=>  in  3),  product  of: 

■  0.037530307  =  query Weight(content:As?iijd=),  product  of: 
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■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.0145402225  =  (MATCH)  fieldWeighhcontentASfiij^3  in  3), 
product  of: 

■  2.0  =  tf(termFreq(content:A5(aiij^a)=4) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.0068359375  =  fieldNorm(field=content,  doc=3) 

Page  10: 

•  boost  =  0.2296042 

•  digest  =  bc9c562d0a61b335f5a8730fl4412dcb 

•  lang  =  ar 

•  segment  =  20100305180909 

•  title  =  !<_s  £ jju'J  Vs  -  £ jju'J  'l>“  Vs  -  America.gov 

.  tstamp  =  20100305230924025 

•  url  =  http://www.america.gov/ar/publications/ejoumalusa.html 
score  for  query:  ^  p j j  l  h> 

•  5.40101 96E-4  =  (MATCH)  sum  of: 

o  5.4010196E-4  =  (MATCH)  weigh^content^iij'-^  in  86),  product  of: 

■  0.037530307  =  query Weigh^contentA^jj'-^),  product  of: 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.035288982  =  queryNorm 

■  0.014391088  =  (MATCH)  fieldWeighhcontentA^iij'-^  in  86), 
product  of: 

■  3.4641016  =  tf(termFreq(  content:  ASj\ijd=)=12) 

■  1.0635134  =  idf(docFreq=121,  numDocs=130) 

■  0.00390625  =  fieldNorm(field=content,  doc=86) 
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APPENDIX  F 


This  is  the  detail  score  for  query  of  top  10  pages  using  NutchDocument Analyzer. 
Search  Term:  (Democratic) 

Page  1: 

.  boost  =  0.16692342 

.  digest  =  C49dl3elfa4eb518258862a27800f398 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  -  America.gov 

.  tstamp  =  20100307151 151020 

•  url  =  http://www.america.gov/ar/global/democracy.html 

score  for  query:  l  j  ^  p jj  l  5 

.  0.29354417  =  (MATCH)  sum  of: 

o  0.17619587  =  (MATCH)  weight(anchor:'J^piij'^sA2.0  in  24),  product 
of: 

■  0.3 1401145  =  query Weight( anchor: 'J-\spjJ-^»A2.0),  product  of: 

■  2.0  =  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.561 1 129  =  (MATCH)  fieldWcight(anchor:UA_s!^jjLK?°  in  24), 
product  of: 

■  1.0  =  t  f(termFre  q  (anchor :'  f\i  j  d=cs»)=  1 ) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.125  =  fieldNorm(field=anchor,  doc=24) 

o  5.458426E-4  =  (MATCH)  weight(content:'J-\Sfiijd=LsS  in  24),  product  of: 

■  0.03718038  =  query  Weight(contentdiJ-\^iijd=LS»),  product  of: 
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■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.014680931  =  (MATCH)  fieldWeight(content:'JAs(aiiJlKss  in  24), 
product  of: 

■  2.828427  =  tf(termFrcq(contcnt:'JAj^jjC^A)=8) 

■  1.063013  =  idf(docFreq=122,  nuinDocs=131) 

■  0.0048828125  =  fieldNorm(field=content,  400=24) 

o  0. 1 16802454  =  (MATCH)  weight( title:' .5  in  24),  product  of: 

■  0.23550858  =  query  Weight(  title: 'Jas?i3j'-HsSa  1.5),  product  of: 

■  1.5=  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.4959584  =  (MATCH)  fieldWeight(title:'JASfiijdvs  in  24), 
product  of: 

■  1.4142135  =  tf(termFreq(title:'JASfiijdv“)=2) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.078125  =  fieldNorm(field=title,  doc=24) 

Page  2: 

•  boost  =  0.231 17816 

•  digest  =  60 17cffadc06ecf855 13b0eb565flb8 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  -  America.gov 

.  tstamp  =  2010030715 11 15329 

•  url  =  http://www.america.gov/ar/global.html 
score  for  query:  l  J  ^  p jj  l  Ks  5 

.  0.17680001  =  (MATCH)  sum  of: 
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o  0.17619587  =  (MATCH)  weight( anchor :UAs?iiJ-Hs»A2.0  in  23),  product 
of: 

■  0.3 1401145  =  query Weight( anchor: 'J^^iij'^sA2.0),  product  of: 

■  2.0  =  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.561 1 129  =  (MATCH)  fieldWeight(anchor:'dA^jjd=^e  in  23), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'J^^tijO=>(^o)=l) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.125  =  fieldNorm(field=anchor,  doc=23) 

o  6.041371E-4  =  (MATCH)  weigh^contenfU-^iij^cs®  in  23),  product  of: 

■  0.03718038  =  query  Weight(contcnt9JA_s^jj'-Ayo),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.016248815  =  (MATCH)  fieldWeigh^contenf'JA^j'H^  in  23), 
product  of: 

■  2.236068  =  tf(termFreq(content:'J^fiij'-l=L5“)=5) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0068359375  =  fieldNorm(field=content,  doc=23) 

Page  3: 

.  boost  =  0.11378951 

•  digest  =  ab333ad468abf764c43637fe53b7e4f7 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  J  pu  <-*< J  I -  0110  - 

America.gov 

.  tstamp  =  20100307151214969 
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•  url  =  http://www.america.gov/ar/publications/ejoumalusa/01 10. html 
score  for  query:  l  j  ^  p jj  I  5 

.  0.033559922  =  (MATCH)  sum  of: 

o  5.23 19805E-4  =  (MATCH)  weight(content: ' JAsptij'-Ks®  in  89),  product  of: 

■  0.03718038  =  query  Weight(contcnt:'JAjpdj'-K?°F  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.0140718855  =  (MATCH)  fieldWeight(content:'JAspiij'-Ks“  in 
89),  product  of: 

■  3.8729835  =  tf(termFreq(  content: 'JAsptij'-K;»)=  15) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0034179688  =  fieldNorm(field=content,  doc=89) 

o  0.033036724  =  (MATCH)  weight( title:' JAspiijdv°Al  .5  in  89),  product  of: 

■  0.23550858  =  query  Weight( title: 'JAspiij'-Hs»A  1.5),  product  of: 

■  1.5=  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0. 14027822  =  (MATCH)  fieldWeight(title:'JAspL3j'^  in  89), 
product  of: 

■  1.0  =  tf(termFreq(title:'JAspiijl\s“)=l) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.03125  =  fieldNorm(field=title,  doc=89) 

Page  4: 

•  boost  =  0.028445216 

•  digest  =  9212154ec8740ad77458648f74aal49c 

•  lang  =  ar 

•  segment  =  20100307101343 

•  title  =  ‘-ks  'J^' Jp  |  p j0=uju  Jsp  c J  -  America.gov 
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.  tstamp  =  20100307151423606 

•  url  =  http://www.america.gov/ar/democracy/global/index.html 
score  for  query:  l  J  Pjj  I  ±><s  3 

.  0.030395675  =  (MATCH)  sum  of: 

o  0.022024484  =  (MATCH)  weight( anchor: 'J^(>tijd=i(^3A2.0  in  15),  product 
of: 

■  0.3 1401145  =  queryWeigh^anchordJ^^j'^^.O),  product  of: 

■  2.0  =  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.0701391 1  =  (MATCH)  fieldWeight(anchor:'JA?(a^-)^a^s  in  15), 
product  of: 

■  1.0  =  tf(termFreq(anchor:'J^^L3jd=^»)=l) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.015625  =  fieldNorm(field=anchor,  doc=15) 

o  1.12010006E-4  =  (MATCH)  weight(content: Uas^  ii J-Ks'i  in  15),  product 
of: 

■  0.03718038  =  query  Weight(content:UA^j'^V“X  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.0030126106  =  (MATCH)  fieldWeigh^content'JASfiij'-Ks®  in 
15),  product  of: 

■  3.3 166249  =  tf(termFreq(content:'dAs?iiJ-k^»)=l  1) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  8.544922E-4  =  fieldNorm(field=content,  doc=15) 

o  0.008259181  =  (MATCH)  weight(title:'JAs?iijlKs»A1.5  in  15),  product  of: 

■  0.23550858  =  queryWeight(title:'J^lij'-l='c53A1.5),  product  of: 

■  1.5=  boost 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 
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0.03497641  =  queryNorm 


■  0.035069555  =  (MATCH)  fieldWeight(title:'JA5^J^  in  15), 
product  of: 

■  1.0  =  tf(termFreq(title:UAs?iijd=cs»)=l) 

■  4.488903  =  idf(docFreq=3,  numDocs=131) 

■  0.0078125  =  fieldNorm(field=title,  doc=15) 

Page  5: 

•  boost  =  1.0000145 

•  digest  =  eed4dd9817b50ffda0aefl58be6e4cl2 

•  lang  =  ar 

•  segment  =  20100307101052 

•  title  =  ^  IJi -  America.gov 

.  tstamp  =  20100307151057483 

•  url  =  http://www.america.gov/ar/ 
score  for  query:  l  J  ^  pjj j  l  ±><s  3 

.  0.0021392573  =  (MATCH)  sum  of: 

o  0.0021392573  =  (MATCH)  weight(content: ' JAffeLA*/*  in  0),  product  of: 

■  0.03718038  =  qucryWeight(contcnt:UA_s^jjd^A),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.05753726  =  (MATCH)  FieldWeigh^contentdJ^^J^5  in  0), 
product  of: 

■  1.7320508  =  tf(termFreq(content:'J^^lij'-^L5“)=3) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03125  =  fieldNorm(field=content,  doc=0) 

Page  6: 

•  boost  =  0.22865272 

.  digest  =  dcl27d214554a59575782c318462f4e8 
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•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  s->. .  America.gov 
.  tstamp  =  20100307151128611 

•  url  =  http://www.america.gov/ar/multimedia/podcast.html 

score  for  query:  l  j  ^  l  b^  3 

.  6.041371E-4  =  (MATCH)  sum  of: 

o  6.041371E-4  =  (MATCH)  weighhcontenf'JASfiijd^  in  61),  product  of: 

■  0.03718038  =  qucryWeighticontcnCJAs^df-K?0)-  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.016248815  =  (MATCH)  fieldWeigh^contenf'JASfiijd^s  in  61), 
product  of: 

■  2.236068  =  tf(termFreq(content:CAs,»iij'-HsS)=5) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0068359375  =  fieldNorm(field=content,  doc=61) 

Page  7: 

•  boost  =  0.23039404 

•  digest  =  8ed8fcd743fflce5d4c42db83fc549af 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  Uj^'3  '  J'fcS  jA/*  - '  -  America.gov 

.  tstamp  =  20100307151 136760 

•  url  =  http://www.america.gov/ar/amlife.html 
score  for  query:  I  J  pdj  l  -t><#  s 

•  5.4035656E-4  =  (MATCH)  sum  of: 

o  5.4035656E-4  =  (MATCH)  weight(content: UASfii jl-WsS  in  3),  product  of: 

■  0.03718038  =  query Weighticontcnt/J^^jj'A^'o),  product  of: 
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■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.01453338  =  (MATCH)  fieldWeight(content:IJASf\ijd=c£»  in  3), 
product  of: 

■  2.0  =  tf(termFreq(content:'J^^iij'4aL5o)=4) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0068359375  =  fieldNorm(field=content,  doc=3) 

Page  8: 

•  boost  =  0.22700267 

•  digest  =  c639ba79e6601fl242cee32b3ba640f4 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  -  'Jlj'l-h  -  America.gov 

.  tstamp  =  20100307151 120668 

•  url  =  http://www.america.gov/ar/amlife/people.html 
score  for  query:  I  J  *<s  pjj  1 ±><5 s 

.  4.6796253E-4  =  (MATCH)  sum  of: 

o  4.6796253E-4  =  (MATCH)  weight(content: ' Jas?i3j'-Kss  in  7),  product  of: 

■  0.03718038  =  qucryWeight(contcntf  JA_s^jj'-Ks'°h  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.012586276  =  (MATCH)  fieldWeight(content:'JA^-)LK£“  in  7), 
product  of: 

■  1.7320508  =  tf(tcrmFrcq(content:U^^oj^^o)=3) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0068359375  =  fieldNorm(field=content,  doc=7) 

Page  9: 


boost  =  0.22826105 
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.  digest  =  c33a5dc3f7d8475491bfafcf91c8b283 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  -  America.gov 


.  tstamp  =  2010030715 1153574 

•  url  =  http://www.america.gov/ar/econ.html 

score  for  query:  l  j  ^  p jj  I  5 

•  4.6796253E-4  =  (MATCH)  sum  of: 

o  4.6796253E-4  =  (MATCH)  weight(content: ' JAspiiJ-Kp  in  16),  product  of: 

■  0.03718038  =  queryWeight(content:'JA5ptijd=L5S),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.012586276  =  (MATCH)  fieldWeight(content:'JA5ptijd=L5S  in  16), 
product  of: 

■  1.7320508  =  tf(termFreq(content:'JAspiij'-Ks“)=3) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0068359375  =  fieldNorm(field=content,  doc=16) 

Page  10: 

.  boost  =  0.23110132 

•  digest  =  3c7f5cldc4d604ef275f72043bf8cfcl 

•  lang  =  ar 

•  segment  =  20100307101 102 

•  title  =  secondary  Multimedia  -  ju^'LsJ  jg-JI -  America.gov 
.  tstamp  =  20100307151 158462 

•  url  =  http://www.america.gov/ar/multimedia.html 
score  for  query:  l  j  ^  pjjj  l  5 

.  4.6796253E-4  =  (MATCH)  sum  of: 

4.6796253E-4  =  (MATCH)  weight(content: UAspj j'-Ks®  in  38),  product  of: 
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o 


■  0.03718038  =  queryWeight(content:'J-\5fL3j^s),  product  of: 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.03497641  =  queryNorm 

■  0.012586276  =  (MATCH)  fieldWeigh^content'JA^J^K^  in  38), 
product  of: 

■  1.7320508  =  tf(terrrLFreq(content:UAs?iiJ-Hs»)=3) 

■  1.063013  =  idf(docFreq=122,  numDocs=131) 

■  0.0068359375  =  fieldNorm(field=content,  doc=38) 
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